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Object  recognition  is  a central  problem  to  computer  vision.  This  dissertation 
describes  a novel  approach  that  recognizes  multiple  objects  embedded  in  their  natural 
surroundings  from  a 2D  gray  scale  image.  The  objects  recognized  may  be  translated, 
rotated  and  scaled  replicas  of  the  model  or  template.  The  keys  to  this  approach  are  the 
construction  of  a four  dimensional  hyper-image  space  and  the  creation  of  a measurement 
function  for  subspace  correlation  using  a phase-only  filter.  In  the  hyper-image  space,  the 
four  dimensions  correspond  to  translations  along  the  jc  and  y axes,  rotation  and  scaling, 
respectively.  Rotation  and  scale  changes  are  converted  to  linear  shifts  using  “conformal 
mapping.”  The  measurement  function  is  defined  as  the  maximum  value  of  the  subspace 
correlation  coefficients  obtained  using  phase-only  filtering.  This  function  provides  a 
reliable  judgment  to  detect  the  translation,  rotation  and  scaling  parameters  simultaneously. 
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Searching  for  the  objects  is  accomplished  using  the  peak  sharpness,  a measurement  of  the 
sharpness  of  the  maximum  of  the  subspace  correlation. 

To  achieve  a fast  processing  speed,  the  implementation  of  the  algorithm  employs 
multiscale  and  parallel  computation  techniques.  The  combination  of  both  techniques 
increases  the  processing  speed  significantly.  Using  an  nCUBE  2 supercomputer,  the 
processing  time  is  51  seconds  for  an  image  with  a size  of  512  x 512  pixels. 

Examples  of  utilization  of  the  method  in  automated  object  recognition  and  image 
registration  are  presented.  These  examples  demonstrate  that  this  approach  has  the 
following  salient  features:  (1)  The  shapes  of  the  objects  can  be  of  any  complex  form.  (2) 
The  result  of  the  recognition  is  relatively  insensitive  to  noise,  occlusion,  and  illumination 
changes.  Applications  of  the  method  to  some  real-world  problems  are  also  proposed, 
which  include  virtual  world  verification  and  reconciliation  for  radioactive  waste  cleanup 
and  aerial  and  satellite  imagery-based  virtual  reality.  The  initial  tests  of  the  applications 
illustrate  that  this  method  allows  the  integration  of  computer  vision  and  computer  graphics 
under  the  scheme  of  pattern  recognition.  This  approach  provides  solutions  to  computer 
vision  problems  that  are  very  difficult  to  solve  if  sensing  technology  is  used  exclusively. 

This  recognition  method  will  renew  the  research  interest  in  the  correlational  object 


recognition  method. 


CHAPTER  1 
INTRODUCTION 


Object  recognition  is  a rapidly  developing  area  in  computer  vision  and  artificial 
intelligence.  Its  applications  include  (1)  the  assembly  or  inspection  of  manufactured  parts; 
(2)  the  navigation  of  autonomous  vehicles  on  land,  in  the  air,  or  under  the  sea;  (3)  target 
recognition  for  security  systems;  (4)  the  analysis  of  microscopic  images  and  medical  x- 
rays;  and  (5)  law  enforcement.  Due  to  its  wide  application  potential,  research  in  the  area 
of  object  recognition  has  been  extensive[1]. 

Building  an  autonomous  object  recognition  system  is  one  of  the  most  challenging 
problems  in  computer  vision.  There  are  two  basic  approaches  to  this  problem:  feature- 
based  and  non-feature-based.  The  feature-based  approach  extracts  features  from  images, 
such  as  vertices,  lines,  or  surface  patches,  and  then  matches  them  against  the  model.  This 
type  of  approach  has  been  the  focus  of  research  in  computer  vision  for  the  last  decade, 
especially  since  improved  3D  sensing  techniques  such  as  laser  range  finders  have  been 
available.  However,  despite  their  successes,  today’s  feature-based  recognition  systems  are 
still  limited  to  recognizing  a handful  of  simple  object  models  in  clean,  uncluttered 
images121.  One  of  the  main  reasons  for  this  limitation  is  the  difficulty  of  the  segmentation 
that  provides  the  features  for  further  processing.  To  date,  there  is  still  no  robust 
segmentation  method  available  to  meet  the  requirements  of  current  recognition  systems121. 
It  is  even  questionable  whether  segmentation  and  feature  extraction  can  be  feasible  or 
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successful  without  the  recognition  of  objects.  It  may  be  more  reasonable  to  use 
segmentation  as  a post-processing  tool  instead  of  a pre-processing  tool131. 

The  non-feature-based  method  is  an  attractive  alternative  to  the  feature-based 
method  because  it  does  not  require  feature  extraction  and  the  shape  of  the  object  can  be  of 
any  complex  form.  The  best  developed  non-feature-based  methods  are  based  on 
geometrical  invariance,  which  include  the  moment  method  and  the  correlation  method. 

The  moment  method,  first  proposed  by  Hu[4],  uses  the  moments  of  the  object  as  the 
invariant  representation  to  recognize  objects.  Despite  the  latest  achievements  made  in 
using  the  moment  invariant  method  to  recognize  occluded  objects[5],  this  method  still 
requires  that  objects  reside  in  a clean  background  which  is  impractical  for  most 
applications.  Some  other  invariant  methods  like  the  Group  theoretical  methods161  suffer 
from  the  same  problems  as  that  of  the  moment  method. 

The  correlation  method  is  the  basis  for  optical  pattern  recognition171.  In  1976, 
Casasent  and  Psaltis181  suggested  using  a combination  of  the  Fourier  transformation  and 
the  Mellin  transformation  to  accomplish  a position,  rotation,  and  scale  invariant  optical 
correlation.  The  magnitude  of  the  Fourier  transform  is  translation  invariant;  and  the 
magnitude  of  the  Mellin  transform  is  rotational  and  scaling  invariant.  The  Mellin 
transformation  can  be  viewed  as  a Fourier  transformation  in  a polar-logarithmic 
coordinate  system,  where  the  signal  is  sampled  exponentially  along  the  radius.  A matched 
filter  is  then  applied,  and  the  object  is  detected  by  measuring  the  correlation  peak  on  the 
output  plane.  A large  amount  of  research  has  been  done  since  then  to  improve  their 
results.  Because  the  matched  filter  often  generates  a broad  peak,  many  new  filters  have 
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been  proposed,  such  as  phase-only  filters'91,  inverse  filters'101,  fractional  power  filters'111, 
pure  phase  correlation'121,  and  joint  transform  correlators'13'141.  Despite  the  improvement  of 
optical  hardware  and  better  filtering  methods,  it  seems  that  no  significant  progress  has 
been  made  after  Casasent  and  Psaltis’  initial  work  in  the  sense  that  no  reported  work  in 
this  category  can  detect  an  object  in  its  natural  surroundings.  In  1988,  Wechsler  and 
Zimmerman'151  applied  the  conformal  mapping'161  directly  to  the  input  image  itself  instead 
of  to  its  Fourier  transform,  then  used  the  magnitude  of  the  Fourier  transform  of  the 
conformal  mapped  image  as  the  rotation  and  scale  invariant.  The  invariant  generated  from 
the  object  was  matched  to  the  template  utilizing  a neural  network.  However,  this  method 
works  only  if  the  origin  of  the  conformal  mapping  of  the  object  is  the  same  as  the  origin  of 
the  conformal  mapping  of  the  template'171,  and  the  origin  for  the  object  is  usually  unknown 
before  the  object  is  recognized  and  located.  Wang  et  a/'18'191  developed  an  algorithm  to 
accomplish  the  recognition  in  two  steps:  rotation  and  scale  detection  followed  by 
translation  detection.  In  the  first  step,  the  magnitude  of  the  Fourier  transform  is  used  as 
the  translation  invariant,  and  this  invariant  is  transferred  to  a polar-logarithmic  coordinate 
system  to  convert  the  rotation  and  scale  into  linear  shifts.  Rotation  and  scale  are  then 
detected  using  an  inverse  filter.  In  the  second  step,  the  template  is  regenerated  to  the 
correct  size  and  orientation  using  the  rotation  and  scale  parameters  obtained  from  step 
one.  Next,  inverse  filtering  is  applied  again  to  get  the  translation  parameters.  Although  this 
method  works  well  for  one  object  in  a clean  image,  it  fails  whenever  a background  is 
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The  inability  to  recognize  objects  in  a cluttered  environment  is  the  reason  why  the 
non-feature-based  method  has  languished  while  the  feature-based  method  is  thriving.  It  is 
interesting  to  see  what  causes  both  the  moment  method  and  the  correlation  method  to  fail. 
The  problem  is  that  both  methods  employ  certain  kinds  of  global  geometrical  invariance. 
Although  utilizing  invariance  or  pseudo  invariance  is  the  major  reason  for  many  advances 
in  computer  vision,  the  global  geometrical  invariant  has  little  chance  of  success  in  non- 
feature-based  methods  for  several  reasons.  First,  the  invariant  is  a global  operator  which 
has  to  be  a type  of  integration  or  summation  in  the  entire  space  to  cancel  certain  local 
properties.  The  globalization  of  the  invariant  operation  implies  that  the  background  and 
noise  are  added  to  the  entities  of  the  object.  This  addition  causes  the  distortion  of  the 
object  entities  and  then  matching  fails.  One  solution  to  the  problem  is  to  eliminate  the 
background  before  processing,  as  is  done  by  many  researchers.  However,  separation  of 
the  object  from  the  background  is  a difficult  problem,  if  not  unsolvable,  without 
recognition  of  the  object  first.  Secondly,  the  invariant  reduces  the  dimensions  of  the 
variable  space  by  canceling  some  variables,  therefore  it  can  not  be  linear.  The  author’s 
experiments  with  the  magnitude  of  the  Fourier  transform  and  the  magnitude  of  the  Mellin 
transform  show  that  the  nonlinearity  of  the  invariant  causes  the  signals  to  overlap, 
although  they  are  separated  before  the  invariant  operations.  This  nonlinear  problem  cannot 
be  solved  using  filtering  methods  or  any  other  methods  which  are  based  on  the  linear 
system  theory.  In  addition,  invariant-based  methods  cannot  detect  the  movement 
parameters  that  are  often  necessary  for  many  applications,  such  as  autonomous  vehicle 
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navigation  and  robot  manipulation.  Therefore,  it  is  necessary  to  explore  avenues  other 
than  geometrical  invariance. 

In  1980,  Hester  and  Casasent[201  suggested  that  recognition  can  be  achieved  by 
using  a linear  combination  of  reference  images  to  create  a composite  image  and  then 
cross-correlating  the  inputs  with  this  one  composite  image.  For  example,  if  the  distortion 
is  an  in-plane  rotation  and  N-training  images  are  used  to  represent  this  distortion,  the 
composite  image  can  be  a linear  combination  of  images  obtained  from  rotating  the  target 
image  by  0,  2nlN, ...,  2n{N-l)IN  rad,  respectively.  Once  the  linear  combination  weights 
are  calculated,  the  composite  image  is  synthesized  on  a digital  computer  or  in  an  optical 
laboratory  by  using  multiple-exposure  hologram  techniques. 

One  method  that  outperforms  many  other  composite  filter  schemes  is  the  Minimum 
Average  Correlation  Energy  filter  (MACE)t21].  This  method  directly  filters  in  the 
frequency  domain,  which  allows  designers  to  enforce  constraints  such  as  the  filter  being  a 
phase-only  filter.  In  addition,  the  MACE  filter  minimizes  the  average  correlation  plane 
energy,  which  keeps  the  sidelobes  in  the  correlation  plane  as  low  as  possible. 

Another  method  proposed  by  Kallman[22]  deals  with  a two-class  problem.  This 
method  defines  a Signal-to-Noise-Ratio  as  peak  divided  by  clutter.  An  explicit 
maximization  of  the  ratio  is  hard  to  do.  Instead,  Kallman  suggests  maximizing  the  peak 
with  respect  to  a composite  image  and  then  adjusting  the  weight  of  the  composite  image 
to  reduce  the  clutter.  Kallman  has  applies  this  basic  method  to  the  design  of  Phase-Only 
Filters  and  Binary  Phase-Only  Filters1231. 
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However,  the  question  arises  as  to  whether  it  is  necessary  to  make  the  in-plane 
rotation  problem  into  an  /V-parameter  estimation  problem  when  there  is  only  one  rotation 
angle  that  needs  to  be  determined.  It  is  obvious  that  the  more  types  of  distortion  to  be 
included,  the  more  images  the  composite  image  would  contain.  Therefore  this  approach 
leaves  little  room  for  solving  a true  3D  recognition  problem.  Moreover,  despite  many 
elegant  algorithms  in  this  category124',  the  current  composite  filter  method  still  experiences 
difficulties  in  recognizing  an  object  when  the  background  is  present. 

If  the  goal  is  to  recognize  objects  with  planar  distortions  that  may  be  translation, 
rotation  and  scaling,  the  main  challenge  utilizing  the  non-feature-based  method  is  the 
recognition  of  objects  embedded  in  their  natural  surroundings  or  corrupted  by  noise.  From 
the  above  discussion,  a conclusion  can  be  drawn  that  both  reducing  the  dimensions  of  the 
recognition  problem  by  using  invariance  and  expanding  its  dimensions  to  a large  N using 
composite  filters  are  inappropriate  approaches.  The  remaining  possible  option  is  to  solve 
this  problem  in  a four-dimensional  space,  where  the  four  dimensions  correspond  to 
translations  along  the  x and  y axes,  rotation  about  z axis,  and  scale  changes.  This 
dissertation  describes  a novel  object  recognition  scheme  in  such  a four-dimensional  hyper- 
image space.  In  this  space,  the  rotation  and  scale  changes  of  the  object  are  converted  to 
linear  shifts  using  conformal  mapping1161,  and  therefore  any  linear  system  theory  is 
applicable.  The  recognition  is  accomplished  by  searching  for  the  maximum  value  of  a new 
measurement  function.  This  measurement  function  is  defined  as  the  maximum  value  of  the 
subspace  correlation  coefficients  obtained  using  phase-only  filtering.  The  coordinates  of 
the  maximum  value  of  this  measurement  function  give  the  translation  parameters.  The 
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coordinates  of  the  maximum  of  the  subspace  correlation  associated  with  the  maximum 
value  of  the  measurement  function  indicate  the  rotation  and  scale  changes.  The  existence 
of  the  object  is  determined  using  the  peak  sharpness  to  judge  how  sharp  the  maximum  of 
the  subspace  correlation  is.  This  recognition  scheme  is  mathematically  well  defined  and  its 
implementation  is  very  simple. 

Besides  its  theoretical  value,  the  approach  improves  the  existing  methods  in  terms 
of  its  distinguishing  results.  It  detects  multiple  objects  in  their  natural  surroundings 
without  any  preprocessing.  The  object  can  be  of  any  complicated  shape  or  form. 
Recognition  is  insensitive  to  occlusion,  noise,  and  illumination  changes.  The  translation, 
rotation  and  scaling  parameters  are  detected  simultaneously  while  recognizing  the  objects. 
To  the  best  of  the  author’s  knowledge,  there  is  no  other  method  to  date  that  achieves  such 
results. 

The  processing  time  is  an  important  factor  to  evaluate  an  algorithm.  The  parallel 
data  structure  of  the  method  provides  advantages  for  implementing  it  on  a parallel 
computer.  Using  64  processors  of  a supercomputer  nCUBE  2 and  a multiresolution 
technique,  the  processing  time  is  51  seconds  for  an  image  with  a size  of  512  x 512  pixels. 

This  recognition  method  opens  a new  avenue  for  solving  computer  vision 
problems.  Particularly  its  ability  to  utilize  imagery  as  a template  provides  a natural  way  to 
generate  a template  using  CAD  tools.  This  ability  joins  the  computer  vision  and  computer 
graphics  using  the  scheme  of  object  recognition.  Therefore,  many  traditional  computer 
vision  problems  can  be  solved  by  utilizing  a CAD-generated  model  to  achieve  the  desired 
results.  This  dissertation  presents  two  applications:  virtual  world  updating  for  radioactive 


8 


waste  cleanup  and  aerial  image-based  virtual  reality.  The  initial  tests  demonstrate  the 
potential  of  the  method,  and  also  show  the  urgency  of  extending  the  method  to  3D  for 
finally  solving  these  real-world  problems. 

The  rest  of  the  dissertation  is  organized  as  follows:  Chapter  2 describes  the 
recognition  scheme  in  detail;  Chapter  3 presents  different  computer  implementation 
schemes;  Chapter  4 shows  the  results  of  experimental  examples  and  illustrates  how  to 
apply  the  method  to  object  recognition  and  image  registration  problems;  Chapter  5 
describes  how  this  recognition  method  is  used  to  provide  solutions  for  real-world 
applications;  Chapter  6 presents  the  conclusions  drawn  from  this  research. 


CHAPTER  2 

OBJECT  RECOGNITION 


As  discussed  in  the  previous  Chapters,  the  correlation  method  initiated  by  Casasent 
and  Psaltis181  has  many  advantages  such  as  no  need  for  feature  extraction  and  insensitivity 
to  noise  and  illumination  changes.  However,  utilizing  the  invariant  disables  it  when 
background  is  present.  This  research  evolves  from  their  work  with  adding  two  major 
improvements:  1)  using  a 4D  hyper-image  space  instead  of  the  invariant  as  the 
representation  of  the  image,  and  2)  developing  a 2D  subspace  matching  strategy  instead  of 
a direct  4D  correlation  to  recognize  objects.  The  2D  matching  strategy  consists  of  phase- 
only  filtering  in  2D  subspaces  and  searching  for  the  maximum  of  a new  measurement 
function. 

In  this  Chapter,  Section  2. 1 introduces  the  construction  of  the  four  dimensional 
hyper-image  space;  Section  2.2  presents  the  phase-only  filter;  Section  2.3  defines  the 
measurement  function;  and  Section  2.4  shows  how  to  detect  the  existence  of  the  object 
using  peak  sharpness. 


2. 1 Four  Dimensional  Hvper-Image  Space 
Planar  translations,  rotations  and  scale  changes  of  an  object  form  a 4D  space. 
However,  the  behaviors  of  the  four  parameters  are  quite  different  (i.e.  in  the  Cartesian 
coordinate  system  the  planar  translations  are  linear  shifts;  in  the  polar  coordinate  system 
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the  rotation  is  a linear  shift;  and  in  the  logarithmic  coordinate  system  the  scale  change 
becomes  a linear  shift).  In  order  to  apply  the  Fourier  analysis,  the  4D  space  must  include 
these  coordinate  systems  to  convert  rotation  and  scaling  into  linear  shifts  as  translations  in 
the  Cartesian  coordinate  system. 

Given  an  image  fix,  y),  (-N/2  <x,y  <N/2),  where  N is  the  size  of  the  image,  and 
denoting  its  four  dimensional  representation  with  a subscript  4,  the  hyper-image 
f4(x,y,x’,y’)  can  be  constructed  as 

Mx,y,x',y')  = f(x  + x',y  + y')  (1) 

where  the  coordinates  ( x’,  y’),  (-N/2  < x’,y’<  N/2)  represents  a point  on  the  image  plane. 
Changing  ( x ’,  y’)  from  the  Cartesian  coordinate  system  to  the  polar  coordinate  system 
(0,r),  f4  ( x , y,  x’,y’)  becomes 

/4  (x,  y , 0,  r)  = f(x  + r cos  0,  y + r sin  0)  (2) 

where 


■ = yl(x'-x)2+(y'-y)2. 


(3) 


and 


0=  arctan 


yX-X  j 


(4) 


Changing  the  radius  r to  an  exponential  grid,  that  is,  substituting  l = In  r for  variable  r 
allows  Eqn.(2)  to  be  written  as 

/4  (x,  y,  0,  l ) = f(x  + e1  cos  0,  y + el  sin  0)  (5) 


The  purpose  of  the  expansion  to  another  two  dimensions  and  the  geometric 
transformation  from  the  original  Cartesian  coordinate  system  ( x’,  y’)  to  the  polar- 
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logarithmic  coordinate  system  (0,  /)  is  to  convert  the  rotation  and  scaling  into  linear 
shifts1161. 

As  an  aid  in  visualizing  this  4D  hyper-image  space,  one  can  view  this  as  a 
“mushroom  field”,  where  at  every  point  ( x , y ) a mushroom  grows,  whose  top  is  the 
original  image  f(x’,  y’)  expressed  in  the  polar-logarithmic  coordinate  system 
f(el  cos  0,  el  sin  0)  with  the  origin  located  at  ( x , y).  Figure  2. 1 illustrates  an  example  of 
this  “mushroom  field”  using  a square  as  the  object.  Figure  2.1(a)  is  the  original  image  in  a 
Cartesian  coordinate  system  whose  origin  is  the  center  of  the  image.  The  x axis  is 
horizontal  and  the  y axis  is  vertical,  with  a range  -32<r,y<32.  The  boundaries  of  the 
square  are  x = -16,  x = 16,  y = -16,  and  y = 16,  respectively.  Figure  2.1  (b)  to  (f)  display 
“mushrooms”  located  at  (0,  0),  (0,  17),  (0,  -17),  (-17,  0),  (17,  0),  respectively.  A polar- 
logarithmic  coordinate  system  is  attached  to  them.  The  origin  of  the  coordinate  system  is 
the  center  of  the  image.  The  0 axis  represents  the  angular  coordinate,  which  is  horizontal 
and  lies  in  the  range  -n  < 0 < 7t . The  l axis  represents  the  radius,  which  is  vertical  with  a 
range  -In32</<ln32.  Using  Eqn.(5),  Figure  2.1(b)  to  (f)  can  also  be  written  as 
/4(O,O,0,/),/4(O,  17,  0,  /),/,( 0,  -17,  0,  l),f4(- 17,0,0,  /),  and  /,(  17,  0,  0,  l),  which  are 
subspaces  of  rotation  and  scaling. 

Denoting  t(x,  y)  as  a template,  in  the  4D  space  it  can  be  written  as 
f4  (x,  y,  0,  /)  = t(x  + el  cos  Q,y  + e‘  sin  0) , (-7t<0<7t,  -lny^/clny)  (6) 

If  a scene  image /contains  the  template  that  is  rotated  with  an  angle  0O  , scaled  by 
a factor  1 Ik,  and  translated  to  (x0y0),  in  the  Cartesian  coordinate  system  then  f(x  \ y’)  can 
be  expressed  as 
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(a)  f4(x,y)  (b)  /4(O,O,0,  Z)  (c)  /,(O,17,0,  l) 


Figure  2.1.  An  example  of  the  four  dimensional  hyper  image  space,  (a)  is  the  original 
2D  image  containing  a square,  (b)  to  (f)  are  “mushroom”  growing  at  the  center,  top, 
bottom,  left  and  right  side  of  the  square  respectively. 
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jc'cos0o +ysin0o  -x'  sin  0n  + v'  cos  0n 

f (*•?)  = «: -J1 £— -*6. Y2 —y«)  + b(x\y) 


(7) 


where  b (x’,  y’)  represents  the  background  and  (-y  < x',y'<  y).  In  Eqn.  (7),  it  is 

assumed  that  only  one  object  appears  for  simplicity.  However,  it  is  easy  to  extend  Eqn.(7) 
to  the  more  general  case  of  multiple  objects.  If 

x'=rcos0,  /=rsin0,  / = lnr,  /0  = ln& 

then  Eqn.  (7)  becomes 

f(el  cos 0, el  sin 0)  = t(eH)  cos(0 - 0O ) - x0 , el~l°  sin(0 -%)-y0)+b(el  cos 0, el  sin 0)  (8) 

where  -n<Q<n  , and  -lny  </<  lny . To  add  another  two  dimensions,  Eqn. (8)  is 
rewritten  as 

f(x  + el  cos  0,  y + e1  sin  0) 

-t((x-x0)  + el  cos(0-0o),(y-yo)  + e/  lf)  sin(Q-Q0))  + b(x  + el  cosQ,y  + el  sinQ)  (9) 


where  -y  < x,  y < y . In  the  4D  space,  therefore,  the  scene  can  be  expressed  as 

Mx,y,Q,l)=t4(x-x0,y-y0,e-Q0,l-l0)  + b4(x,y,Q,l)  (10) 

Eqns.(6)  and  (10)  provide  mathematical  descriptions  of  the  template  and  the  scene 
in  the  4D  space.  Although  it  is  possible  to  implement  a recognition  method  directly 
matching  Eqn. (10)  to  Eqn. (6)  in  this  4D  space,  a recognition  strategy  which  decomposes  a 
4D  space  match  into  two  sequential  searches  in  2D  subspaces  is  more  desirable.  This 
approach  not  only  greatly  simplifies  the  implementation,  but  also  lowers  the  excessive 
memory  need  required  for  direct  4D  matching. 
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To  develop  a 2D  subspace  matching  strategy,  it  is  helpful  to  reexamine  the 
example  shown  in  Figure  2.1.  If  the  square  in  Figure  2.1(a)  is  the  scene /(x,  y),  then  the 
“mushrooms”  from  Figure  2. 1 (b)  to  (f)  are  subspaces  in  the  0-/  plane.  Figure  2.2  shows 
the  template  t(x,  y)  in  (a)  and  one  of  its  subspace  t4(0,  0,  0,  /)  in  (b).  Comparing  Figure  2.2 
(b)  with  Figure  2. 1 (b)  to  (f),  it  is  clear  that  Figure  2.2  (b)  is  a shifted  replica  of  Figure  2. 1 
(b),  but  totally  different  from  Figure  2. 1 (c)  to  (f).  Therefore,  if  every  “mushroom”  or  2D 
subspace  of f4  in  the  entire  4D  space  is  compared  with  the  template  ^(0,0,0,  /),  there  is 
only  one  “mushroom”  that  matches  with  the  template  and  all  others  will  be  different.  This 
fact  implies  that  the  recognition  can  be  accomplished  in  two  steps:  1)  locate  a 
“mushroom”  that  contains  the  shifted  replica  of  the  template  from  all  “mushrooms”  using 
a criterion,  and  2)  find  the  rotation  and  scaling  parameters  from  this  correct  “mushroom”. 
In  section  2.2,  a phase-only  filter  will  be  presented,  which  is  the  foundation  for  the 
criterion;  In  Section  2.3,  a measurement  function  will  be  defined  as  the  criterion  and 
matching  strategy  will  be  described. 


(a)  f(x’y)  (c)  /4(O,O,0,  /) 


Figure  2.2.  Template  of  a square  signal,  (a)  is  the  original  template,  (b)  is  the  2D 
subspace  representation  of  the  template.  It  is  also  a “mushroom”  growing  at  the  center  of 
the  template  image. 
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2.2  Phase-Only  Filtering  in  a 2D  Subspace 

The  goal  of  this  section  is  the  development  of  a 2D  subspace  matching  filter. 
Although  many  different  types  of  filters  can  be  used  to  accomplish  matching,  a phase-only 
filter  is  chosen  for  its  better  discrimination  ability  and  stability,  which  will  be  explained 
later  in  this  section. 

In  1982,  Hayes1251  examined  the  role  played  by  the  phase  and  amplitude  in  the 
Fourier  domain  in  the  transmission  of  a continuous  tone  picture.  His  finding  was  that  the 
phase  information  is  considerably  more  important  than  the  amplitude  information  in 
preserving  the  visual  intelligibility  of  the  picture.  He  developed  an  algorithm  with  which 
the  original  picture  can  be  reconstructed  using  the  phase  information  alone.  In  1984, 
Homer  and  Gianino191  proposed  a phase-only  filter  for  pattern  recognition.  This  filter  has 
100%  optical  efficiency  and  a greatly  improved  ability  to  discriminate  compared  with  the 
classical  matched  filter.  An  explanation  for  the  better  discrimination  abilities  of  the  phase- 
only  filter  is  that  it  acts  as  a high  pass  filter.  It  enhances  the  high  frequencies  where  the 
differences  between  objects  and  background  are  in  general  greater1261. 

Using  u.  and  v to  denote  frequencies  along  the  0 and  / axes,  and  using  upper  case 
letters  to  denote  the  Fourier  transform  of  a function,  the  Fourier  transform  of  the  template 
in  the  subspace  is  defined  as 

T4(x,y,u,  v)  = ||  t4(x,y,Q,l)e~j2n{u^vl)dQdl  (11) 

where  j2  = -1.  If  x and  y are  digitized  as  integers,  Eqn.(l  1)  represents  NxN  2D  subspace 
templates.  For  the  purpose  of  subspace  matching,  only  one  of  them  is  necessary.  The 
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subspace  7X0, 0,0,  l)  is  chosen  as  the  template  for  convenience.  The  frequency  response  of 
the  phase-only  filter  using  this  template  is  then  defined  as[27] 


H (u  v)=M0M=^(«,v) 

^ j ir4(o,o,M,v)i 


(12) 


where  superscript  * denotes  the  complex  conjugate  and  cp  ( u , v)  is  the  phase  of  T4( 0,0,  u,v ). 
Eqn.(12)  shows  that  the  phase-only  filter  is  stable  because  of  its  unit  magnitude.  This 
property  is  not  displayed  by  some  other  high  discriminating  filters  like  the  inverse  filter[28]. 
The  subspace  Fourier  transform  of f4  (x,  y,  0,/)  is  given  by 


F4{x,y,u,v)  = \\  f4(x,yAl)e-j2niu<hvl)dQdl  (13) 

After  filtering,  the  correlation  G in  the  frequency  domain  is 

G(x,  y,  u,  v)  = F4  (x,  y,  u,  v)H^  ( u , v)  (14) 


Applying  the  inverse  Fourier  transformation  to  Eqn.  (14),  the  correlation  g(x,  y,  0,  /)  is 
obtained 

g(x,y,  0,/)  = ll  G(x,y,u,v)ej2K(u^vl)dudv  (15) 


Taking  the  magnitude  of  Eqn.  (15)  gives  the  correlation  coefficients. 


2.3  Measurement  Function 

The  geometrical  meaning  of  I g(x,  y,  0,  l)  I is  that  it  represents  the  correlation 
coefficients  of  the  input  scene  and  template.  From  the  analysis  at  the  end  of  Section  2.1,  it 


can  be  expected  that 
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I £(*o.yo»®k>.  <!>)•=  max  I g(x,  y,  0,  /)l  (16) 

x.y, 6,1 

where  x0,  yo,  0o,  lo  are  the  translations,  rotation  and  scaling  of  the  object  relative  to  the 
template.  In  other  words,  the  correlation  coefficient  I g(x,  y,  0,  l)  I is  the  maximum  if  the 
template  matches  the  object  in  the  scene.  For  instance,  Figure  2.2(b)  matches  Figure 
2.1(b)  with  zero  translations  and  zero  rotation,  but  a scaling  factor  of  l0  = In  2. 

Now  the  challenging  is  to  develop  a searching  strategy  to  find  \g(x0,  y0,  0o,  lo) I and 
its  coordinates  ( xo , yo,  0o,  lo)  in  the  4D  space.  If  Eqn.(16)  is  rewrite  as 

lg(*o»;yo>*Wo),=  max(max  I g(x,  y,  0,  /)l ) (17) 

x.y  e,/ 

Then  it  can  be  seen  from  Eqn.(17)  that  one  way  to  accomplish  the  search  for  the  maximum 
is  to  find  the  maximum  values  of  I g I in  the  0-/  plane  and  then  perform  another  search  in 
the  x-y  plane.  In  order  to  do  so,  a measurement  function  m(x,y)  is  defined  as 

m(x,  y)  = maxi  g(x,  y,  0,  /)l  (18) 

0,/ 

The  above  definition  shows  that  at  point  ( x , y),  m(x,y)  represents  the  maximum  of  the 
subspace  correlation  coefficients,  or  in  other  words,  the  maximum  of  the  correlation 
coefficients  in  the  “mushroom”  growing  at  this  point.  Searching  for  the  maximum  value  of 
m(x,y)  gives 

™(*o  . ?o ) = max  "*(■*>  y)  £(*o . . 0O . lo  )•  (19) 

x.y 

Therefore  the  translation  parameters  can  be  obtained  by  computing  max  m(x,  y) . The 

x.y 

coordinates  of  the  maximum  m(x0,yo ) indicates  the  translations,  or  where  the  object  is 
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located.  Once  the  location  of  the  object  is  known,  the  correct  subspace  or  “mushroom” 
can  be  chosen  to  determine  the  rotation  and  scaling  parameters  using  the  following 
equation: 


1 g(x0 , y0 , 0O , l0  )l  = max  I g(x0 ,y0,Q,l) I (20) 

0,/ 

The  coordinates  (0O,  l0)  of  1 g(x0,  yo,  0o,  lo) I in  the  0-/  plane  signify  the  rotation  and  the 
scaling  change. 

The  implementation  of  the  recognition  scheme  is  straight  forward  and  works  as 
follows: 

1 ) construct  the  phase-only  filter  using  Eqn.(  1 2), 

2)  construct  the  4D  hyper-image  space  for  the  image  and  compute  its  subspace  Fourier 
transformation  using  Eqn.(13), 

3)  calculate  m(x,  y)  utilizing  Eqns.(14),  (15)  and  (18), 

4)  search  for  the  maximum  value  m{x0,  yo),  whose  location  (x0t  yo)  gives  the  translations 
of  the  object, 

5)  compute  I g(x0,yo,Q,  l)  I in  Eqn.(20)  and  search  for  its  maximum  value.  The  coordinates 
(0O,  lo)  of  the  maximum  of  I g(x0,  yo,  0,  l)  I gives  the  rotation  and  scaling. 


2.4  Peak  Sharpness 
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In  the  last  section,  the  problems  of  how  to  locate  an  object  and  how  to  determine 
its  rotation  and  scaling  have  been  solved,  with  an  assumption  that  the  object  is  present  in 
the  input  image.  However,  the  question  of  how  to  determine  the  existence  of  an  object 
must  be  answered  for  automated  object  recognition.  To  address  this  problem,  the  concept 
of  peak  sharpness  is  introduced. 

The  peak  sharpness  is  an  attribute  used  to  measure  how  large  a jump  a correlation 
maximum  has  with  respect  to  the  average  of  the  correlation  coefficients.  The  peak 
sharpness  is  used  as  a performance  criterion  for  correlation  filters  in  optical  pattern 
recognition.  Among  the  different  formulas  defined  in  the  literature,  one  that  is  analytically 
tractable  is  the  peak-to-correlation  energy  ( PCE ): [U1 


where  n is  the  number  of  the  maximums  and  E is  the  correlation  energy  given  by 


PCE  is  also  called  discriminative  signal-to-noise  ratio1291.  It  provides  a means  to  evaluate 
the  quality  of  a correlation  response.  From  Eqn.  (21)  it  can  be  seen  that  a large  PCE  value 
means  the  peak  or  the  maximum  of  the  correlation  is  low  compared  with  the  energy  of  the 
correlation  coefficients,  which  indicates  the  absence  of  the  object;  and  a small  PCE  value 
indicates  the  peak  is  high,  which  signifies  the  existence  of  the  object.  It  should  be  noticed 
that  before  Eqn.  (21)  is  applied  to  evaluate  the  correlation  peak,  the  correlation 
coefficients  must  be  normalized  so  that  a comparable  value  of  PCE  may  be  obtained.  In 
this  research  the  correlation  is  normalized  to  a range  from  0 to  255  for  the  convenient 
display  of  coefficients  as  an  image  on  a digital  computer. 


E 


(21) 


E - If  I g(x0 , y0 , 0,  /)l2  dddl 


(22) 
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The  effectiveness  of  the  peak  sharpness  for  determination  of  the  existence  of  the 
object  is  demonstrated  by  observing  the  recognition  results  shown  in  Chapter  4.  In  the 
cases  where  the  object  exists,  there  must  be  a sharp  peak  in  the  subspace  correlation 
coefficients  I g(xo,  yo,  0,  /) I,  which  gives  a smaller  PCE.  For  example,  in  the  results  for 
clean  images  like  Figure  4.1(c)  and  4.2(c),  there  is  a bright  dot  present,  and  the  values  of 
the  peak  sharpness  are  19.768  for  the  square  image  and  18.122  for  the  triangle  image.  In 
contrast,  when  the  object  does  not  exist  in  the  input  image,  like  the  example  illustrated  in 
Figure  4.3(c),  the  value  of  the  peak  sharpness  increases  to  24.481,  and  there  is  not  a single 
bright  dot  appearing.  When  the  image  is  cluttered,  the  peak  sharpness  is  higher  overall, 
whether  the  object  is  present  or  not.  But  the  differences  of  peak  sharpness  between  the 
cases  of  existence  and  absence  of  the  object  can  still  be  clearly  identified.  For  example,  in 
Figures  4.4(c),  4.5(b)  and  4.5(e),  Figure  4.6(c)  and  (d),  where  the  object  is  present,  the 
values  of  the  peak  sharpness  are  in  a range  from  20.179  to  25.035.  However,  in  Figure 
4.7(c),  where  there  is  not  a single  bright  dot  in  \g(x0,  y0,  0,  /) I,  the  peak  sharpness  is 
increased  to  30.186.  According  to  these  observations,  when  the  type  of  the  environment  is 
known,  a threshold  can  be  selected  for  the  peak  sharpness  based  on  a statistical  analysis  of 
a large  number  of  experiments.  If  a value  of  the  peak  sharpness  is  higher  than  the 
threshold,  the  object  is  present;  otherwise  the  object  is  absent. 

It  should  be  emphasized  that  the  peak  sharpness  is  only  a relative  measurement 
based  on  the  statistical  evaluation  of  the  subspace  correlation  result.  The  threshold  of  the 
peak  sharpness  will  be  changed  with  the  size  of  the  image,  the  type  of  the  environment,  the 
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number  of  objects,  and  the  degree  of  occlusion  in  the  input  image.  Therefore,  to  ensure 
that  the  peak  sharpness  indicates  the  truth,  some  post  processing  technique  or  a better 


No. 

Figure 

Peak 

Sharpness  ( PCE) 

Objects 

1 

3.4 

23.699 

Yes 

2 

3.5(b) 

24.132 

Yes 

3 

3.5(e) 

22.938 

Yes 

4 

3.6(c) 

20.179 

Yes 

5 

3.6(d) 

25.035 

Yes 

6 

A.l 

20.023 

Yes 

7 

A.2 

20.950 

Yes 

8 

3.7 

30.186 

No 

9 

A.3 

25.419 

No 

10 

A.4 

27.664 

No 

11 

A.5 

29.550 

No 

12 

A.6 

25.200 

No 

Table  2.1.  Summary  of  the  experiments  on  the  peak  sharpness  threshold 
criterion  must  be  developed.  To  exam  the  range  of  the  peak  sharpness  under  different 
conditions,  several  experiments  have  been  conducted.  Table  2. 1 summarize  the  results  of 
the  experiments  that  include  1 2 examples  in  a typical  indoor  environment. 

One  measure  that  can  be  taken  to  significantly  improve  the  performance  of  the 
peak  sharpness  as  well  as  overall  recognition  result,  is  to  increase  the  image  resolution. 


22 


Generally  speaking,  the  larger  the  image  size,  the  better  the  resolution,  the  higher 
performance  of  the  peak  sharpness. 


CHAPTER  3 

COMPUTATIONAL  IMPLEMENTATION 


The  implementation  of  the  recognition  method  described  in  the  last  chapter  is 
simple.  The  most  straight  forward  way  is  to  follow  the  steps  listed  in  Section  2.3  directly 
and  implement  them  on  a serial  machine.  However,  due  to  the  intensive  computation 
involved  in  this  method,  this  basic  implementation  may  be  too  slow  for  many  practical 
applications.  A faster  speed  is  desirable  for  any  applications,  in  particular  for  robotics 
sensing,  because  the  robot  control/visualization  systems  need  real-time  sensory  feedback 
of  the  environment  for  task  planning  and  execution.  Fortunately,  the  data  structure  of  the 
proposed  method  is  suitable  for  multiscale  and  parallel  implementation,  which  are  the  two 
most  powerful  techniques  to  increase  the  processing  speed.  By  applying  multiscale  and 
parallel  computing  techniques,  a near-real  time  speed  is  achieved. 

This  chapter  will  discuss  the  basic  implementation,  multiscale  implementation,  and 
parallel  implementation.  For  each  implementation,  a detailed  description  and  flow  charts 
are  supplied  in  order  to  help  the  readers  follow  the  instructions  and  develop  their  own 
computer  program.  In  this  research  the  author  used  C and  UNIX  workstations  as  software 
and  hardware  platforms.  A complete  list  of  the  C code  is  given  in  Appendix  B. 
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3.1  Basic  Implementation 

The  basic  implementation  of  the  algorithm  is  described  in  Section  2.3.  This 
Section  will  discuss  some  practical  considerations  and  provide  more  computational  details. 
Define  an  image  transformation  block  as  shown  in  Figure  3.1.  Corresponding  to  steps  1) 
and  2)  listed  in  Section  2.3,  this  block  will  be  used  to  construct  the  phase-only  filter  and 
4D  hyper-image  space.  In  Figure  3.1,  the  first  operation  is  the  generation  of  a new  image 
from  the  input  image  by  setting  a new  center  at  point  ( x , y).  To  ensure  that  the  new  image 
has  the  same  size  as  the  input  image,  the  part  of  the  new  image  outside  of  the  input  image 
is  set  as  zero,  as  shown  in  Figure  3.2. 


Image 

Transformation 


Figure  3.1.  Image  Transformation 
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Input  Image 


New  Image  Zero  Part 


To  generate  the  measurement  function  m(x,y),  a measurement  function  block  is 
defined  as  in  Figure  3.3,  which  corresponds  to  step  3)  in  Section  2.3.  It  should  be  noticed 
that  in  Figure  3.3  the  measurement  function  refers  only  to  one  point  (x,y). 

Performing  the  Measurement  Function  Calculation  over  the  whole  input  image  and 
searching  for  the  maximum  and  its  location  in  the  4D  space  gives  the  translations  along  the 
x and  y axes,  rotation  and  scaling  factor.  These  operations  correspond  to  step  4)  and  5)  in 
Section  2.3.  The  whole  algorithm  is  displayed  in  Figure  3.4,  in  which  the  examination  of 
peak  sharpness  is  added  for  object  recognition.  The  dotted  line  area  is  called  the  basic 
search  block,  which  is  used  in  both  multiscale  and  parallel  implementations. 
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Measurement  Function  Calculation 


Figure  3.3.  Measurement  Function  Calculation 
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Figure  3.4.  Basic  Implementation  of  the  Algorithm 


The  processing  time  of  this  implementation  is  rather  long.  If  the  size  of  the  image 
is  64x64  pixels,  the  processing  time  is  3 1 minutes  on  a Silicon  Graphics  Crimson  Elan 
VGXT  workstation  with  a 50  MHz  MIPS  R4000  CPU.  In  this  case,  it  is  not  feasible  to 
directly  process  images  with  a size  of  512  x 512  pixels,  which  is  standard  for  most  image 
acquiring  equipment.  To  overcome  this  computational  barrier,  more  advanced 
implementations  need  to  be  employed. 
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3.2  Multiscale  Implementation 


As  the  first  step  using  a multi  resolution  approach,  the  original  image  must  be 
scaled  down  several  levels,  i.e.,  the  size  of  a 256  x 256  image  is  reduced  to  lower  levels 
128  x 128  and  64  x 64  by  a factor  of  2 at  each  reduction.  The  three  levels  of  the  image 
form  a pyramid  structure1291,  which  is  illustrated  in  Figure  3.5.  First  the  recognition  is 
processed  at  the  64x64  level  using  the  algorithm  described  in  the  last  section.  After 
checking  64  x 64  “mushrooms”,  the  maximum  value  of  the  measurement  function  can  be 
determined,  whose  location  indicates  the  position  of  the  object  as  a pixel  at  that  level 
which  is  marked  as  black  in  Figure  3.5.  This  pixel  actually  represents  a block  of  2x2  pixels 
at  the  higher  level  of  128  xl28,  marked  by  the  shadow.  The  next  step  is  processing  the 
image  at  the  level  of  128  x 128  using  the  same  algorithm.  But  this  time  only  four 
“mushrooms”  growing  at  the  block  of  pixels  need  to  be  examined.  Once  a new  pixel  at  this 
level  is  allocated,  which  is  marked  as  black,  the  recognition  procedure  is  applied 
repeatedly  to  the  level  of  a 256  x 256  image. 


Figure  3.5.  Pyramids  Structure 
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It  should  be  pointed  out  that  the  multi  resolution  approach  is  used  here  solely  for 
the  purpose  of  overcoming  the  limitations  of  current  computer  speed.  It  may  not  be 
necessary  in  the  recognition  scheme  if  a larger  and  faster  computer  is  available.  Thus  any 
scaling  factor  can  be  chosen  as  long  as  the  speed  and  the  final  resolution  are  satisfactory. 
However,  the  fast  Fourier  Transformation  usually  requires  the  size  of  the  image  must  be  a 
power  of  2.  Futher,  the  higher  the  resolution  at  the  starting  level,  the  better  the 
recognition  result,  but  the  slower  the  processing  speed. 


Figure  3.6.  Multiscale  Implementation  of  the  Algorithm 
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Figure  3.6  shows  the  algorithm  of  the  multiscale  implementation,  in  which  M 
represents  the  number  of  scales,  subscript  m indicates  a particular  scale  level,  Nm  is  the 
size  of  an  image  at  level  m,  (xm,  ym)  is  the  coordinates  of  the  center  of  the  object  in  the 
image  at  level  m. 

3.3  Parallel  Processing 

From  the  discussion  in  Section  3.1  it  can  be  seen  that  the  generation  of  the 
measurement  function  m(x,  y ) in  Eqn.  (18)  is  computationally  expensive.  To  calculate  a 
point  of  m(x,  y),  the  necessary  steps  include:  polar-logarithm  transformation,  2D  fast 
Fourier  transformation(FFT),  phase-only  filtering,  and  2D  inverse  FFT.  If  an  input  image 
is  of  a size  64  x 64,  there  will  be  4096  points  in  ra( x,  y),  and  the  above  procedure  must  be 
executed  4096  times.  Generating  these  points  consumes  more  than  99%  of  the  processing 
time.  However,  it  also  can  be  seen  from  Figure  3.3  that  to  compute  a point  of  m(x,  y),  the 
input  includes  a template  phase-only  filter  and  a 2D  subspace  of  the  hyper-image  space, 
which  can  be  directly  calculated  from  the  given  template  and  the  scene  image 
independently.  There  is  no  need  for  any  other  intermediate  input  from  any  neighboring 
points.  This  data  structure  is  the  best  for  parallel  processing,  since  all  processors  use  the 
same  input  and  there  is  no  interconnection  between  them.  Therefore,  a natural  approach  to 
reduce  the  processing  time  is  the  utilization  of  multiple  processors  of  a parallel  computer. 
Assuming  that  the  speed  of  a parallel  processor  is  the  same  as  the  speed  of  a serial 
processor,  the  parallel  processing  approach  can  decrease  the  time  to  approximately  1/M  of 
the  time  spent  on  a serial  implementation,  where  M is  the  number  of  processors  used.  For 
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instance  in  our  previous  example,  if  64  processors  are  used  for  a 64x64  image,  each 
processor  needs  calculating  only  64  points  of  m(x,  y ) instead  of  4096  points  as  in  serial 
processing.  Therefore,  the  processing  time  will  be  approximately  1/64  of  the  time  as  that 
in  serial  processing. 

The  parallel  processing  implementation  in  this  research  employs  a host/node 
mode1301.  A Silicon  Graphics  4DLG  workstation  with  a 33  MHz  MIPS  R2000A/R3000 
processor  is  used  as  a host  machine,  and  a supercomputer  nCUBE  2 provides  64  nodes  for 
parallel  computation,  where  each  node  has  a 7.5  MHz  MIPS  processor  and  16  Mbytes  of 
memory.  Figure  3.7  illustrates  the  system  configuration.  The  host  machine  provides  a user 


Figure  3.7.  System  Configuration  for  parallel  processing 


interface  and  performs  processing  when  only  a single  processor  is  necessary.  The  major 
part  of  the  computation  is  accomplished  with  parallel  processing  computation.  This 
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host/node  model  allows  the  vision  system  to  take  advantage  of  the  features  of  both  the 
host  computer  Silicon  Graphics  and  the  nCUBE  2 supercomputer,  since  a program  can 
offload  intensive  calculations  to  the  processor  array  while  maintaining  access  to  libraries 
and  utilities  specific  to  the  host  computer.  For  example,  the  host  program  launches  a 
parallel  program  in  the  processor  array  to  generate  the  measurement  function,  and  displays 
the  parallel  program’s  results  using  the  Motif  Window  System,  and  interface  with  Silicon 
Graphics’  Distributed  Graphics  Library  or  other  graphics  packages  such  as  Inventor. 


Figure  3.8.  Parallel  Implementation  of  the  Algorithm 
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Figure  3.8  illustrates  the  parallel  implementation.  Comparing  this  figure  with 
Figure  3.4,  it  can  be  seen  that  the  only  difference  between  them  is  that  more  processors 
participate  in  the  basic  search  in  parallel  processing,  therefore,  the  computing  load  for 
each  processor  is  decreased.  Except  adding  the  part  for  locating  multiple  processors  and 
managing  their  communication,  the  code  for  serial  implementation  can  be  directly  used  for 
parallel  implementation  with  little  modification. 

The  improvement  using  parallel  processing  on  the  supercomputer  is  significant. 
The  processing  time  is  51  seconds  fora512x512  pixel  image,  when  a image  size  of 
64x64  pixels  is  chosen  as  the  starting  level  with  the  multiresolution  approach.  The 
processing  time  is  reduced  drastically  compared  to  3 1 minutes  using  the  previous  serial 
implementation  on  a Silicon  Graphics  Crimson,  although  its  CPU  is  50  MHz  that  is  only 
half  as  fast  as  the  host  computer  and  5.7  times  faster  than  the  node  computer. 


CHAPTER  4 

EXPERIMENTAL  EXAMPLES 


In  this  section,  experimental  results  on  both  synthetic  and  real  images  are  shown. 
Unless  specified,  all  images  displayed  are  in  a gray  scale  ranging  from  0 to  255  with  an 
image  size  of  512x512  pixels  regardless  of  their  displaying  size.  In  these  images,  the  origin 
of  the  attached  coordinate  system  is  located  at  the  center  of  the  image.  The  jc  axis  is 
horizontal  and  the  y axis  is  vertical.  The  positive  direction  of  the  x axis  is  pointed  from  the 
right  to  the  left,  and  the  positive  direction  of  the  y axis  is  downward. 

4, 1 Experiments  with  Simple  Synthetic  Images 

The  first  test  shown  in  Figure  4. 1 is  to  recognize  a square  in  a clean  image.  In 
Figure  4.1,  (a)  displays  an  object  and  (b)  its  template,  where  the  template  is  twice  as  large 
as  the  object.  Figure  4. 1 (c)  and  (d)  are  the  results  for  the  images  down  sized  to  a 64x64 
pixel  level.  Figure  4.1(c)  displays  the  subspace  correlation  lg(xo,yo,0,/)  I,  which  is  the 
rotation  and  scale  detection  result.  The  x axis  represents  the  rotation  -K  < 0<  7t,  and  the  y 
axis  represents  scaling  change  -ln64</<  In 64.  There  are  four  bright  dots  present  in  (c), 
whose  y coordinates  are  all  the  same  and  equal  to  -10,  and  whose  x coordinates  are  at  32, 
16,  0,  -16  respectively.  That  means  there  are  four  replicas  of  the  template  with  the  same 
scaling  factor  k0  =e-|0xln64/64  =0.5221,  but  with  different  rotations  at  180°,  90°,  0°,  and 
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-90°  respectively.  The  peak  sharpness  value  of  19.768  signifies  the  existence  of  the  object. 
It  should  be  noticed  that  when  calculating  the  peak  sharpness  for  this  image,  n = 4 in 
Eqn.(21)  since  there  are  four  maximums  present.  Figure  4.1(d)  is  the  translation  detection 
result  where  the  brightest  dot  present  at  the  center  of  the  image  indicates  that  no 
translation  has  occurred.  Figure  4.1(d)  is  also  the  measurement  function  in  Eqn.(18), 
which  can  be  visualized  as  a “mushroom  field”.  The  pixel  value  in  Figure  4.1(d)  is  the 
maximum  magnitude  of  the  correlation  coefficients  in  the  “mushroom”  growing  at  this 
pixel. 


Figure  4.1(e)  to  (j)  display  the  procedure  of  multiresolution  implementation  step  by 
step,  in  which  (e)  and  (f)  are  the  results  for  the  image  size  of  128x128  pixels,  (g)  and  (h) 
are  the  results  for  the  image  size  of  256x256  pixels,  and  (i)  and  (j)  for  image  size 
512x512,  which  give  the  final  results  of  this  recognition  task.  Figure  4.1  (e),  (g),  and  (i) 
are  rotation  and  scaling  detection  results.  Their  coordinate  systems  are  the  same  as  the  one 
in  (c),  except  that  the  range  of  the  vertical  axis  is  changed  to  -In  Af  < / < InM,  where 
M=  128,  256,  512,  respectively,  corresponding  to  each  level.  Figure  4.1(f),  (h),  and  (j)  are 
the  translation  detection  results,  where  there  are  four  squares  in  every  figure,  and  each  of 
these  square  represents  a pixel.  In  this  multiresolution  discussion  the  coordinate  system  in 
(d),  (f),  (h),  and  (j)  differs  from  the  definition  given  at  the  beginning  of  this  Chapter.  The 
origin  of  the  coordinate  system  here  is  at  the  upper  left  comer  of  the  image.  The  x axis 
points  towards  right  and  the  y axis  is  downwards.  The  four  pixels  in  (f)  are  corresponding 
to  one  pixel  in  (d),  which  is  the  brightest  pixel  in  the  center  with  coordinate  (32,  32).  The 
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Figure  4.1.  A simple  test  for  object  recognition  using  multiresolution 
technique,  (a)  is  the  scene  and  (b)  is  the  template.  The  square  in  (b)  is  twice  as 
large  as  the  one  in  (a),  (c)  gives  the  result  of  the  rotation  and  scaling  detection. 
The  four  brightest  dots  signify  the  presence  of  four  replicas  of  the  template.  The 
coordinates  of  the  four  dots  indicate  that  their  scaling  factors  are  all  0.5,  and  their 
rotations  are  180 , 90 , 0 , -90  respectively,  (d)  is  the  result  of  the  translation 
detection,  where  the  brightest  dot  at  the  image  center  indicates  that  there  is  no 
translation,  (e)  to  (j)  are  the  recognition  results  at  higher  resolution,  where  (e), 
(g),  and  (i)  are  for  rotation  and  scaling  detection,  and  (f),  (h),  and  (j)  are  for 
translation  detection. 
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(e)  Level  128x128 


x = 64,  y = 64 
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(g)  Level  256x256 


(i)  Level  512x512 


x = 257 
y = 256 


x = 256  x = 257 
y = 257  y = 257 


* = 256,  y = 256 


(j) 
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black  pixel  (64,  64)  in  (f)  has  the  maximum  measurement  function  value,  which  in  turn 
represents  the  four  pixels  at  the  level  of  256x256,  which  are  shown  in  (h).  Imaging  the 
brightest  pixel  in  (d)  as  the  top  black  pixel  in  Figure  3.5,  then  (f),  (h),  and  (j)  can  be 
viewed  as  the  shadow  parts  in  each  level  of  the  pyramids  in  Figure  3.5.  In  Figure  4. 1 (i), 
the  brightest  dot  is  located  at  (0,  -57),  which  indicates  that  there  is  no  rotation,  and  the 
scaling  factor  is  k0  = e-57xln5 12/512  = 0.4993  that  is  more  accurate  than  0.5221  obtained 
from  (c).  Comparing  (i)  and  (c)  it  is  clear  that  the  peak  in  (i)  is  much  shaper  than  the  one 
in  (c).  Observing  (c),  (e),  (g),  and  (i)  another  interesting  phenomena  is  that  in  (c)  and  (e) 
there  are  four  replica  but  in  (g)  and  (i),  there  is  only  one.  The  explanation  to  this  is  as 
follows:  The  input  images  for  each  level  is  obtained  from  the  original  512x512  image 
using  a “half  size”  operator  in  a software  package  named  XV  (X  window  image  Viewer). 
In  (i)  and  (g)  the  input  square  signals  are  in  sizes  with  even  number  of  pixels,  therefore, 
the  centers  of  the  squares  are  (255.5,255.5)  and  (127.5,  127.5),  respectively.  Due  to  the 
discrete  spatial  grid,  pixels  (256,  256)  and  (128,  128)  are  considered  as  the  centers. 
However,  around  these  centers  the  images  are  not  symmetric,  which  results  in  that  only 
one  object  is  found.  For  (e)  and  (c)  in  the  downsize  procedure  somehow  input  images 
have  one  pixel  error.  As  the  result,  the  images  have  sizes  in  odd  numbers  of  pixels  and  are 
centered  at  (64,  64)  and  (32,  32),  respectively.  In  this  case  the  square  signals  are 
symmetric  with  respect  to  x an  y axes,  and  there  are  four  replica  existing.  This  example 
shows  that  the  recognition  method  gives  a one-pixel  accuracy. 

The  second  test  is  shown  in  Figure  4.2.  It  is  the  recognition  of  a triangle  that  is 
translated,  rotated  and  scaled  relative  to  the  template.  In  Figure  4.2,  (a)  is  the  scene  and 
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Figure  4.2.  A test  of  object  recognition,  (a)  is  an  object  and  (b)  is  its  template. 
There  are  relative  translations,  rotation,  and  scale  changes  between  them,  (c)  is  the 
result  for  scale  and  rotation  detection,  where  the  coordinates  of  the  brightest  dot 
indicate  that  the  rotation  is  90  degrees  and  the  scale  factor  is  0.6.  (d)  is  the  result 
for  translation  detection,  where  the  displacement  of  the  brightest  dot  to  the  image 
center  shows  the  translations  between  the  object  and  template  are  (15,-16).  In  this 
figure  all  images  are  in  a size  of  64  x 64  pixel. 
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(b)  is  the  template.  Figure  4.2  (c)  is  the  result  of  the  rotation  and  scale  detection,  whose 
peak  sharpness  is  18.122  indicating  the  existence  of  the  object.  The  coordinates  of  the 
brightest  dot  are  (16,  -8),  signifying  a rotation  of  90°  counter  clockwise  with  a scaling 
factor  of  k0  = <T8*ln 64/64  = 0.6.  Figure  4.2  (d)  is  the  result  of  translation  detection.  The 
coordinates  of  the  brightest  dot  indicate  that  the  relative  translation  between  the  two 
triangles  are  15  pixels  in  the  x direction  and  -16  pixels  in  the  y direction.  In  Figure  4.2  (b), 
the  edges  of  the  triangle  are  rather  irregular.  This  is  noise  caused  by  sampling  errors  in  the 
procedure  to  reduce  the  image  size  from  the  original  512  x 512  to  the  current  64  x 64. 
However,  this  noise  does  not  affect  the  recognition  result,  which  is  evidence  that  the 
method  is  insensitive  to  noise. 

It  is  interesting  to  see  what  will  happen  if  the  object  is  not  present  in  the  scene,  for 
instance,  if  a square  is  chosen  as  the  scene  and  a triangle  as  the  template,  as  shown  in 
Figure  4.3  (a)  and  (b).  Figure  4.3(c)  shows  the  subspace  correlation.  It  can  be  seen  that 
there  is  not  a bright  dot  present  in  (c),  in  contrast  to  Figure  4.1(c)  and  4.2(c).  The  value  of 
the  peak  sharpness  is  increased  to  24.48 1 , which  is  higher  than  that  shown  in  the  first  two 
examples.  Figure  4.3(d)  is  the  measurement  function,  where  the  brightest  dot  indicates  the 
location  of  the  maximum  value  of  the  measurement  function.  Since  from  Figure  4.3(c)  it  is 
known  that  no  object  is  present,  Figure  4.3(d)  does  not  have  any  meaning. 
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(b) 


(d) 


Figure  4.3.  An  example  when  no  object  is  present,  (a)  is  a scene  and  (b)  is  a 
template,  (c)  displays  the  subspace  correlation,  where  there  is  no  a single  bright 
dot  present.  This  agrees  with  the  large  value  of  the  peak  sharpness  24.481.  The 
image  size  here  is  64  x 64. 
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4.2  Autonomous  Object  Recognition 

In  this  section,  three  examples  of  object  recognition  in  a real  image  are  presented. 
The  scenes  in  the  examples  were  taken  by  a CCD  camera  with  an  original  image  size  of 
512x512.  These  images  were  then  reduced  to  a size  64  x 64  as  the  input  using  XV.  The 
template  is  obtained  from  the  same  scene  using  the  same  camera.  Then  the  template  is 
rotated  and/or  enlarged,  and  moved  to  the  center  of  a blank  image.  If  an  image  in  this  and 
the  next  section  is  a detection  result,  then  it  is  in  a size  of  64  x 64  pixels. 

The  first  example  is  the  detection  of  an  object  in  a cluttered  image.  In  Figure  4.4, 
(a)  shows  the  scene  and  (b)  shows  the  template  that  is  a camcorder.  In  the  scene,  the 
object  is  rotated  90°  clockwise  and  scaled  by  a factor  of  0.5  relative  to  the  template,  (c)  is 
the  result  of  rotation  and  scale  detection.  The  peak  sharpness  in  this  case  is  23.699,  which 
means  the  object  is  present.  The  coordinates  of  the  brightest  dot  are  (-16,  -10),  which 
represents  a counter  clockwise  rotation  -90°  and  a scale  factor  of  0.5.  (d)  is  the  translation 
detection  result,  where  the  displacement  of  the  bright  dot  to  the  image  center  indicates 
that  the  translation  of  the  centroid  of  the  object  in  the  x direction  is  -16  pixels,  and  in  the  y 
direction  -6  pixels. 

The  second  example  is  the  detection  of  an  object  with  occlusion.  In  Figure  4.5,  (a) 
shows  a scene,  and  (g)  shows  the  template  that  is  a projector.  In  the  scene,  the  template  is 
rotated  by  -90°  and  scaled  by  a factor  of  0.5.  (b)  is  the  result  of  rotation  and  scale 
detection,  where  the  peak  sharpness  is  24.132  indicating  the  presence  of  the  object.  The 
coordinate  of  the  brightest  dot  (-16,  0)  means  a rotation  of  clockwise  90°  and  no  scale 
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Figure  4.4.  The  object  recognition  in  a real  scene,  (a)  is  the  scene  and  (b)  is  the  template. 
In  (a)  the  object  is  translated,  rotated,  and  scaled  with  respect  to  the  template,  (c)  is  the 
result  for  scale  and  rotation  detection,  where  the  brightest  dot  indicates  that  the  rotation  is 
90°  and  scale  factor  is  0.5.  (d)  is  the  result  for  translation  detection,  where  the 
displacement  of  the  brightest  dot  to  the  image  center  shows  that  the  relative  translations 
between  the  object  and  the  template  are  (-16,  -6).  For  the  clear  representation  shown  in 
(d),  a pixel  value  less  than  250  in  the  image  is  set  to  zero. 
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change,  (c)  is  the  translation  detection  result  for  (a),  where  the  displacement  of  the  bright 
dot  to  the  image  center  indicates  the  translation  of  the  centroid  of  the  object  in  the  y 
direction  6 pixels.  Figure  4.5  (d)  displays  an  unoccluded  image  of  the  same  scene.  Figure 
4.5  (e)  and  (f)  are  recognition  results  for  (d),  where  the  peak  sharpness  for  (e)  is  22.938. 
Comparing  (b)  with  (e)  and  (c)  with  (f),  it  is  clear  that  the  coordinates  of  the  brightest  dots 
are  the  same  in  these  two  image  pairs  respectively  and  the  values  of  peak  sharpness  in  (b) 
and  (e)  are  close,  which  demonstrates  that  the  method  is  insensitive  to  occlusion. 

The  third  example  shows  the  recognition  of  two  objects  from  an  input  image.  In 
Figure  4.6,  (a)  is  a scene,  which  is  similar  to  the  one  shown  in  Figure  4.4  (a),  but  differs 
from  it  by  adding  another  camcorder  to  the  image  at  the  image  center.  The  template  is 
shown  in  Figure  4.4  (b),  which  is  twice  as  large  as  the  objects  in  the  scene.  Figure  4.6  (b) 
is  the  translation  detection  result,  the  two  bright  dots  indicate  that  two  objects  exist.  One 
is  located  at  (0,  0)  and  another  is  at  (-16,  -6).  Figure  4.6  (c)  shows  the  result  of  rotation 
and  scaling  detection  for  the  camera  located  at  the  center  (0,  0),  whose  peak  sharpness  is 
20.179,  signifying  the  presence  of  the  object.  The  coordinates  (0,  -10)  of  the  brightest  dot 
in  (c)  indicates  that  there  is  a scaling  factor  of  0.5  and  the  object  is  not  rotated.  Figure  4.6 
(d)  displays  the  rotation  and  scale  detection  result  for  the  object  located  at  (-16,  -6), 
whose  peak  sharpness  is  25.035,  signifying  the  existence  of  the  object.  The  coordinates 
(-16,-10)  indicate  that  the  object  is  rotated  -90°  and  scaled  by  a factor  of  0.5.  It  should  be 
noticed  in  Figure  4.6  (a)  that  the  two  objects  in  the  scene  have  different  illumination. 
However,  the  different  illumination  does  not  affect  the  results,  as  shown  in  (b),  (c)  and  (d). 
This  fact  indicates  that  the  method  is  robust  to  illumination  changes. 
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Figure  4.5.  Recognition  of  an  occluded  object,  (g)  is  the 
template  and  (a)  is  the  scene,  (b)  is  the  rotation  and  scale 
detection  result.  The  coordinates  of  the  brightest  dot  (-16,  0) 
gives  the  rotation  -90°  and  no  scaling  change,  (c)  is  the 
translation  detection  result  for  (a).  The  coordinates  of  the  bright 
dot  show  the  translations  are  (0,6).  (d)  displays  the  same  scene 
without  occlusion,  (e)  is  the  rotation  and  scaling  detection  and 
(f)  is  the  translation  for  the  image  in  (d).  Comparing  (b)  and  (c) 
with  (e)  and  (f)  it  can  be  seen  that  they  all  give  the  correct 
results,  which  demonstrates  that  the  method  is  robust  to 
occlusion.  For  the  clear  presentation  pixel  value  less  than  250  is 
set  to  zero  in  figure  (c)  and  (f). 
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Figure  4.6.  Multi-object  recognition,  (a)  is  the  scene  and  (b)  is  the  result  for  translation 
detection,  where  two  bright  dots  signify  that  there  are  two  objects  in  the  image.  The 
coordinate  for  the  first  object  is  (0,0)  and  for  the  second  is  (-16,  -6).  For  the  clear 
representation  shown  in  (b),  a pixel  value  less  than  235  in  the  image  is  set  to  zero,  (c)  is 
the  result  of  scale  and  rotation  detection  for  the  first  object  located  at  (0,0),  where  the 
brightest  dot  indicates  that  the  scale  is  0.5  and  there  is  no  rotation,  (d)  is  the  result  of  scale 
and  rotation  detection  for  the  second  object  located  at  (-16,  -6),  where  the  brightest  dot 
indicates  that  the  rotation  is  clockwise  90°  and  the  scale  is  0.5. 
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Figure  4.7.  An  example  when  no  object  is  present,  (a)  is  the  scene  and  (b) 
is  the  template,  (c)  shows  the  subspace  correlation.  It  can  be  seen  that  there 
is  no  a single  bright  dot  present  in  (c),  which  agrees  with  the  large  value  of 
peak  sharpness  of  30.186.  (d)  displays  the  measurement  function  where  the 
brightest  dot  in  the  upper  right  corner  indicates  the  location  of  the 
maximum.  However,  since  the  object  does  not  appear  in  the  scene,  it  does 
not  give  the  translations  of  the  object  as  was  the  case  in  the  previous 
examples. 
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Again  it  should  be  examined  what  will  happen  if  the  object  does  not  exist  in  the 
input  image  for  the  case  of  a real  scene.  Figure  4.7  (a)  is  a scene  and  Figure  4.7  (b)  is  a 
template  that  is  not  present  in  the  scene.  Figure  4.7(c)  displays  the  subspace  correlation.  It 
can  be  seen  that  there  is  no  a bright  dot,  and  the  big  value  of  the  peak  sharpness  30. 1 86 
agrees  with  that.  Figure  4.7(d)  shows  the  measurement  function  which  is  meaningless  for 
this  case. 


4.3  Image  Registration 

Image  registration  is  another  area  in  computer  vision  which  is  closely  related  to 
object  recognition.  Its  applications  include  estimation  of  the  motion  of  a moving  object 
from  a sequence  of  images1321,  detection  of  missing  and  unanticipated  objects1181,  and  the 
creation  of  a mosaic  from  a series  of  satellite  images.  There  are  three  common 
approaches132'331  to  solve  this  problem:  (1)  the  feature-based  approach1321,  (2)  the  optical 
flow  approach1331,  and  (3)  the  Fourier  transformation  approach133'341.  Among  these 
approaches,  the  Fourier  transformation  method  is  characterized  by  outstanding  robustness 
against  correlated  noise  and  disturbances,  such  as  those  encountered  with  nonuniform, 
time-varying  illumination.  Castro  and  Morandi1341  applied  this  method  to  detect  both 
translational  and  rotational  movements.  The  method  presented  in  this  dissertation  extends 
the  result  of  Castro  and  Morandi  to  cover  scale  changes.  In  addition,  since  this  method  is 
in  the  category  of  the  Fourier  transformation  approach,  it  preserves  all  advantages  in  the 
work  of  Castro  and  Morandi.  If  one  frame  of  the  image  is  viewed  as  the  object  and 


49 


another  is  viewed  as  the  scene,  then  the  image  registration  task  is  to  detect  the 
translations,  rotations,  and  scale  changes  between  these  two  frames. 

The  first  example  in  Figure  4.8  illustrates  image  registration.  Figure  4.8  (a)  and  (b) 
are  the  two  input  images  where  the  first  is  translated  with  respect  to  the  second  due  to  a 
camera  motion.  Figure  4.8  (c)  shows  the  rotation  and  scale  detection.  The  brightest  dot  at 
the  center  means  that  there  is  no  rotation  and  scale  change  between  two  frames.  Figure 

4.8  (d)  is  the  result  of  translation  detection,  where  the  displacement  of  the  bright  dot  to 
the  center  of  the  image  indicates  a translation  in  the  x direction  of  6 pixels.  Note  that  the 
cone  in  the  second  image  is  missing  but  it  does  not  affect  the  result.  Using  this  technique 
to  determine  the  translation,  rotation,  and  scaling  of  the  two  scenes  will  make  it  possible 
to  perform  image  subtraction  in  the  referenced  scenes  in  order  to  identify  the  missing  or 
unanticipated  objects.1181 

The  second  example  in  Figure  4.9  displays  image  registration  with  translations  and 
a scale  change.  In  Figure  4.9,  (a)  is  the  first  frame  and  (b)  is  the  second,  where  the  first 
frame  is  scaled  and  translated  with  respect  to  the  second  using  image  manipulation.  Figure 

4.9  (c)  is  the  rotation  and  scaling  detection,  where  the  coordinates  of  the  brightest  dot 
(0,  6)  indicate  that  there  is  a scaling  factor  of  1.5  of  the  first  frame  with  respect  to  the 
second,  and  that  no  rotation  occurred.  Figure  4.9  (d)  is  the  result  of  the  translation 
detection,  where  the  coordinates  of  the  bright  dot  are  (15,  10),  which  give  the  relative 
translation  values.  It  can  be  noticed  that  the  translation  values  in  this  example  are  quite 


large. 


50 


Figure  4.8.  An  example  of  image  registration,  (a)  is  the  first  frame  and  (b)  is  the 
second  frame,  (c)  is  the  rotation  and  scale  detection,  where  the  brightest  dot  is  at 
the  center  indicating  that  there  is  no  rotation  and  scale  change,  (d)  is  the  result  of 
translation  detection,  where  the  bright  dot  indicates  that  the  translation  value  is 
(6,  0).  For  the  clear  representation  as  shown  in  (c),  any  pixel  value  less  than  250  in 
the  image  is  set  to  zero. 
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Figure  4.9.  A test  of  image  registration,  (a)  is  the  first  frame,  and  (b)  is  the  second  frame, 
(c)  is  result  for  the  rotation  and  scaling  detection,  where  the  brightest  dot  at  the 
coordinate  (0,6)  indicates  a scaling  factor  of  1 .5  and  no  rotation  change  occurring,  (d)  is 
the  translation  detection,  where  the  bright  dot  indicates  the  translation  of  the  first  frame 
with  respect  to  the  second  one  is  (15,  10).  For  a clear  representation,  any  pixel  value  less 
than  252  in  the  image  is  set  to  zero. 
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4.4  Remarks 

1 . In  summary,  nine  examples  have  been  presented  in  this  Chapter.  The  example  in 
Figure  4. 1 shows  the  recognition  of  a square  image  which  showed  that  the  method  is  able 
to  find  all  possible  rotation  parameters.  The  next  example  of  a triangle  image  in  Figure  4.2 
showed  that  the  method  is  insensitive  to  noise.  The  example  in  Figure  4.4  illustrated  the 
recognition  of  an  object  embedded  in  its  natural  surroundings.  The  example  in  Figure  4.5 
demonstrated  that  the  method  is  robust  when  occlusion  happens.  The  example  in  Figure 
4.6  displayed  a result  of  multiple  object  recognition.  This  example  also  indicated  that  the 
method  is  insensitive  to  illumination  changes.  The  example  in  Figure  4.7  showed  the  image 
registration  with  pure  translations,  where  an  object  is  missing  in  one  frame.  The  example 
in  Figure  4.8  illustrated  the  image  registration  with  scale  change  and  large  translations. 
Examples  in  Figure  4.3  and  Figure  4.7  showed  that  the  method  is  able  to  detect  when  the 
object  is  not  present  in  the  input  image  using  the  peak  sharpness  value.  Examples  from 
Figure  4.4  to  Figure  4.9  are  rather  difficult  problems  in  computer  vision.  With  the  method 
in  this  dissertation  they  are  solved  in  a straight  forward  manner.  This  research  claims  that 
within  the  scope  of  the  planar  distortions,  object  recognition  can  be  reduced  to  a simple 
routine. 

2.  It  is  a well  known  fact  that  the  correlation  recognition  method  is  very  sensitive  to 
any  shape  and  view  point  changes181.  Figure  4.10  shows  an  example  of  this  sensitivity  in 
term  of  a view  point  change.  In  Figure  4. 10,  (b)  is  a template  and  (a)  is  a scene,  where 
there  are  three  replicas  of  the  template  present.  Among  three  cans  the  one  on  the  top  of  an 
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instalment  has  a rather  different  view  point  compared  with  other  two.  (c)  is  the  translation 
detection,  where  two  bright  dots  located  at  (-1 16,16)  and  (82,19)  indicate  that  only  two 
cans  are  recognized  and  the  third  one  is  not.  Another  example  of  the  sensitivity  to  the 
shape  change  is  shown  in  Figure  4.1(c)  and  (e),  where  in  the  levels  64  and  128  due  to  one 


(a) Input  scene 


Figure  4.10.  An  example  of  the 
sensitivity  of  the  method  to  the  view 
point  changes,  (a)  is  the  input  scene 
where  there  are  three  coca  cans  and  the 
third  one  is  on  the  top  of  an  instrument, 
which  differs  from  the  other  two  rather 
significantly  in  terms  of  the  view  point, 
(c)  shows  the  detection  result  where  the 
two  bright  dots  indicate  that  the  first  two 
objects  are  detected  but  the  third  one 
is  not. 


(c)  Object  detection 
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pixel  size  difference  the  images  become  symmetric  then  four  replica  are  detected  instead  of 
one,  as  shown  in  Figure  4.1(g)  and  (i).  Overcoming  this  sensitivity  to  create  a fault 
tolerate  scheme  was  the  motivation  for  developing  an  invariant  method181,  and  the  invariant 
introduces  many  problems  associated  with  object  entity  distortions.  In  this  dissertation 
however,  this  sensitivity  allows  for  the  accurate  determination  of  the  planar  movement 
parameters,  and  a one-pixel  level  accuracy  in  detection  is  achieved.  With  the  same 
philosophy  the  author  believes  that  efforts  should  be  made  to  extend  the  method  to  3D  for 
accurately  detecting  the  off-plane  rotation  parameters.  Furthermore  the  author 
discourages  any  attempt  to  use  this  method  for  a fault-tolerate  recognition  scheme.  With 
regard  to  applications  where  shape  changes  are  inevitable,  such  as  galaxy  detection, 
microscopic  analysis,  hand  writing  recognition  etc.,  this  method  may  not  be  directly 
applicable  without  substantial  modification  because  of  its  sensitivity  to  any  shape  change. 


CHAPTER  5 
APPLICATIONS 


The  theoretical  and  experimental  investigations  in  previous  chapters  have 
demonstrated  that  the  proposed  recognition  method  is  very  successful  for  detecting 
objects  with  planar  distortions.  The  fast  processing  speed  and  robustness  to  occlusion, 
noise  and  luminance  changes,  make  the  recognition  method  valuable  to  a broad  spectrum 
of  applications.  The  most  distinguishing  characteristic  of  the  method  however,  is  the 
ability  to  directly  process  raw  data  using  imagery  as  the  template,  and  the  template  may  be 
generated  by  computer  graphics.  This  characteristic  allows  sensory  data  to  be  directly 
compared  with  the  world  model  graphically,  and  therefore,  unites  computer  vision  and 
computer  graphics  under  the  scheme  of  pattern  recognition.  With  the  development  of  this 
method,  the  a priori  knowledge  about  the  environment  can  be  utilized  in  the  form  of  a 
pre-modeled  synthetic  environment  to  guide  the  vision  system  and  to  compensate  the 
sensory  data.  This  scheme  is  superior  to  a scheme  exclusively  using  the  traditional  feature- 
based  computer  vision  approach  in  terms  of  reliability,  accuracy,  speed,  and  automation. 

In  this  chapter,  two  examples  will  be  presented  to  illustrate  how  this  scheme  is  used  for 
some  traditional  computer  vision  problems.  The  first  example  is  facility  characterization 
for  hazardous  waste  cleanup.  The  second  is  the  generation  of  aerial  imagery-based  virtual 
reality.  These  examples  show  that  the  proposed  recognition  method  leads  to 
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unconventional  solutions  for  these  vision  problems  and  demonstrate  their  potentials  for 
finally  delivering  prototype  systems  to  accomplish  these  tasks. 

5.1  Virtual  World  Model  Verification  and  Reconciliation 
A main  task  of  sensing  for  robotics  and  automation  is  providing  3D  geometrical 
environment  information  to  robot  control/visualization  systems,  which  is  often  referred  as 
facility  characterization  or  world  mapping.  In  radioactive  waste  cleanup  operations  where 
the  environment  is  hazardous,  accurate  facility  characterizations  is  critical  because  remote- 
controlled  robots  must  be  used  for  a safe  and  orderly  cleanup  process,  to  essentially 
reduce  operator’s  exposure  to  radiation  in  waste  cleanup.  The  basic  requirements  for 
world  mapping  include  reliability,  accuracy,  speed,  and  automation.135'361  This  section 
proposes  a facility  characterization  system  particularly  designed  for  the  Tank  Waste 
Retrieval  (TWR)  task,  in  which  the  environment  is  well  structured.  The  central  ideas  of 
the  system  includes  pre-modeling  the  environment  as  a virtual  world  database  and  using 
vision  sensors  to  detect  any  difference  in  terms  of  missing  and/or  unanticipated  objects. 
Because  the  number  of  missing  and/or  unanticipated  objects  would  be  rather  small  the 
reconstruction  of  these  objects  and  updating  the  virtual  world  are  considerably  easier  than 
building  a full  database.  This  system  combines  the  strength  of  computer  vision  and 
computer  graphics  and  utilizes  the  a priori  information  extensively.  The  key  to  the  success 
of  the  system  is  detecting  the  difference.  Employing  the  recognition  method  in  this 
dissertation  to  perform  image  registration,  the  virtual  and  real  images  can  be  aligned  and 
any  differences  can  then  be  detected  using  image  subtraction.  The  initial  testing 
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demonstrates  the  feasibility  of  the  system,  although  a considerable  of  amount  of  work 
needs  to  be  done  to  deliver  a demonstration  system. 

5.1.1  Background 

The  goal  of  the  TWR  task  is  the  removal  of  waste  in  the  form  of  radioactive  sludge 
and  solids  which  are  stored  in  tanks  within  the  U.S.  Department  of  Energy  (DOE) 
complex  sites  in  Hanford,  Washington,  Oak  Ridge,  Tennessee,  Femald,  Ohio,  and  areas 
outside  of  Idaho  Falls,  Idaho.  Many  of  these  tanks  have  reached  their  intended  design  life 
and  some  are  leaking  to  the  environment.1341  These  tanks  are  about  75  feet  in  diameter  and 
30  feet  tall.  Inside  of  the  tanks  there  are  pipes  and  valves.  Although  all  original  design  are 
well  documented,  there  are  some  uncertainties  because  the  as-built  tank  is  different  from 
the  as-designed,  and  changes  happened  in  the  waste  dumping  procedure  and  during  the 
storage  over  many  years.  Due  to  the  radiological  hazard,  removal  must  be  done  using 
remote  technologies.  It  is  envisioned  that  this  system  may  be  a computer  controlled  robot 
manipulator  equipped  with  multiple  sensors  which  is  operated  with  little  operator 
intervention. 1341  An  important  part  of  this  system  is  the  accurate  characterization  of  facility. 
This  characterization  will  be  used  for  robot  path  planning  and  task  performance. 

One  approach  to  the  characterization  problem  is  sensor-based  modeling  that  builds 
the  world  model,  depending  solely  on  the  information  obtained  from  3D  Sensors  such  as 
laser  range  finders,  structure  light,  and/or  stereo  cameras, 1371  assuming  without  any  a 
priori  information.  The  approach  described  by  Hebert,  Hoffman,  Johnson,  and  Osborn, 
starts  from  generating  a mesh  which  connects  the  nearest  neighbor  of  range  pixels,  and 
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this  mesh  is  used  as  a data  set  for  surface  segmentation.  The  next  step  is  the  segmentation 
of  the  mesh  into  planar  and  quadric  surface  patches  using  a region  growing  technique.  The 
final  step  is  the  incorporation  of  surface  patches  into  a world  model  of  the  robot  and 
workspace.  This  approach  may  work  well  for  simple  and  well-isolated  objects.  But  it  is 
doubtful  that  it  can  be  used  alone  to  build  a complicated  facility  model  because  of  many 
difficulties  existing,  such  as  occlusion,  multi  frame  image  registration,  long  computational 
time,  and  more  importantly,  the  lack  of  a reliable  way  to  incorporate  the  small  surface 
patches  into  a meaningful  model.  Moreover,  it  is  illogical  that  a robot  is  required  to  plan 
its  path  in  order  to  move  in  its  environment  to  gather  modeling  data  while  its  environment 
is  primarily  unknown. 

Another  category  of  research  is  to  construct  world  models  of  interior  workspaces 
which  utilizes  a priori  information  to  compensate  the  sensor  information.  This  approach 
uses  blue  prints  and  other  a priori  knowledge  to  model  the  virtual  world.  When  the  robot 
enters  a workcell,  the  sensory  information  is  gathered,  processed  and  integrated  to  update 
the  pre-modeled  virtual  world.  Christensen135,381  described  a supervised  teleoperated 
system  with  a world  model  that  includes  the  robot  configuration  and  information 
generated  by  the  robot’s  sensors.  Trivedi1391  reports  another  model-based  system  using 
laser  range  finders  that  allows  testing  of  robot  plans  in  simulation.  Thayer1401  et  al 
computes  an  object’s  location  and  orientation  using  stereo  vision  to  manually  update  the 
world  model.  This  type  of  approach  is  superior  to  the  exclusively  sensor-only  approach  for 
the  TWR  task  because  in  this  case,  a large  portion  of  information  about  the  environment  is 
already  available.  Utilization  of  this  information  to  directly  model  the  virtual  world  using  a 
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CAD  tool1411  is  much  more  realistic  than  modeling  solely  depending  on  3D  sensing. 
However,  in  the  above  systems  one  problem  remains  unsolved.  This  is  how  to  detect  what 
has  been  changed.  In  other  words,  how  can  the  virtual  model  world  be  directly  compared 
with  the  real  world  sensory  information  to  decide  what  part  of  the  sensory  data  should  be 
used  to  update  the  virtual  world  database.  To  allow  this  direct  virtual  to  real  world 
comparison,  a data  representation  that  is  suitable  for  both  CAD  models  and  sensory  data 
must  be  found,. 

Arman  and  Aggarwal  proposed  an  object  recognition  method  using  IGES  (Initial 
Graphics  Exchange  Specifications)  format  as  the  common  data  representation  for  3D 
object  recognition  using  CAD  generated  models[42].  IGES  is  a representation  developed  by 
the  National  Institute  of  Standards  and  Technology,1431  which  is  available  in  most  CAD 
software.  If  a recognition  method  is  capable  of  using  the  IGES  format  for  data 
representation,  it  has  the  potential  to  directly  compare  the  virtual  world  with  the  real 
world  and  to  generate  the  output  that  can  be  used  to  update  the  virtual  world  model. 
However,  as  pointed  by  Arman  and  Aggarwal  in  their  survey  paper1'1  that  the  major  hurdle 
in  using  the  IGES  format  is  the  lack  of  topological  information,  which  is  critical  to  most 
current  vision  systems.  Moreover,  the  goal  for  a CAD  system  is  fundamentally  different 
from  the  goal  of  a vision  system.  In  a CAD  system  that  is  primarily  developed  for  design 
or  simulation,  many  practical  considerations  must  be  included;  i.e.,  the  industry  standards 
and  manufacturer  procedures,  which  may  not  be  necessary  at  all  for  the  vision  application. 
On  other  hand,  in  a vision  system,  information  is  interpreted  in  terms  of  topology  and 
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basic  geometric  primitives,  i.e.,  vertices,  edges,  and  surface  patches,  which  is  not 
acceptable  by  a CAD  system  where  efficient  display  and  easy  manipulation  are  required. 

It  is  clear  that  the  key  to  the  success  of  updating  the  virtual  world  using  sensory 
information  is  a data  representation  with  which  the  virtual  world  can  be  directly  compared 
with  the  real  world.  The  solution  to  the  problem  is  surprisingly  simple,  the  image  itself  is 
the  most  appropriate  representation.  The  image  is  a 2D  array  with  a gray  scale  from  0 to 
255.  It  contains  the  information  about  the  camera  position  and  orientation.  On  other  hand, 
generating  a realistic  image  is  an  important  goal  for  a CAD  system.  It  can  be  predicted 
that  the  computer  produced  image  will  be  more  and  more  resemble  to  the  real  image  with 
the  continuous  development  of  the  computer  graphics.  The  recognition  method  in  this 
dissertation  utilizes  the  image  as  the  data  representation  for  matching,  which  provides  a 
powerful  tool  to  the  virtual/real  world  comparison. 

5.1.2  Proposed  System  for  the  Virtual  World  Updating 

Using  the  object  recognition  method,  a new  virtual  world  updating  system  is 
proposed.1 118,441  In  this  system,  the  environment  is  pre-modeled  as  the  virtual  world.  This 
virtual  world  database  provides  the  template  for  the  virtual/real  world  registration.  Once 
the  virtual/real  images  are  registered,  the  comparison  can  be  accomplished  by  subtraction. 
As  the  result  of  the  comparison,  any  objects  that  are  missing  or  any  additional 
unanticipated  objects  will  be  detected.  Utilizing  the  3D  information  of  the  objects,  their 
3D  surfaces  can  be  reconstructed.  This  surface  reconstruction  task  is  much 
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Figure  5.1  The  Procedure  of  Verification  and  Reconciliation  of  Virtual  World  Model 


62 


easier  than  the  one  used  in  the  sensor-only  approach,  because  only  well  isolated  objects  is 
of  the  concern  and  the  surroundings  of  the  object  are  already  available  for  robot  path 
planning.  In  this  scheme,  the  virtual  and  real  worlds  are  directly  compared  using  the  image 
registration  and  subtraction.  Therefore,  this  system  has  great  potential  to  accomplish  the 
TWR  task.. 

Figure  5.1  illustrated  the  structure  of  the  system.  The  robot  control  system  sends 
the  CCD  camera  to  a desired  position  to  take  an  image  of  the  interested  area  of  the 
environment.  This  positioning  information  is  used  by  the  virtual  camera  that  takes  a 
snapshot  from  the  virtual  world  database,  which  is  pre-modeled  using  a robotics 
simulation  software  called  IGRIP  (Interactive  Graphics  Robot  Instruction  Program). [42] 
Because  of  the  robot  positioning  errors,  the  real  camera  has  small  translations  and 
rotations  with  respect  to  the  virtual  camera,  and  the  real  image  and  the  virtual  image  do 
not  overlap  exactly.  In  order  to  eliminate  the  camera  positioning  errors,  the  real/virtual 
image  registration  must  take  place  as  described  in  Section  3.3.  Here  it  is  assumed  that  the 
real  image  is  rectified  to  satisfy  the  perfect  pin  hole  model  before  registration.  With  the 
information  of  the  translations,  rotations  and  scaling  changes  from  the  image  registration, 
the  accurate  3D  camera  position  can  be  calculated.  Relocating  the  virtual  camera  to  this 
position,  a new  virtual  world  image  can  be  acquired.  Using  a software  called 
RenderMan™146"471  to  apply  ray-tracing  and  texture  rendering  to  the  image  generates  a 
photo-realistic  image  for  real/virtual  world  comparison.  The  comparison  is  accomplished 
by  the  subtraction  of  the  real  image  from  the  virtual  image,  by  which  only  the  different 
parts  are  kept.  If  the  areas  of  the  difference  are  large  enough,  they  are  considered  as 
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missing/unanticipated  objects  and  their  corresponding  3D  information  is  called  in  to 
reconstruct  3D  surfaces.  The  geometrical  primitives  of  the  objects  can  be  determined  from 
the  reconstructed  surfaces  and  then  is  used  to  update  the  virtual  world  database. 

5.1.3  Initial  Testing 

In  this  section  examples  will  be  given  to  demonstrate  how  the  object  recognition 
method  works  for  virtual/real  world  comparison.  Figure  5.2  shows  an  example  of 
detecting  a missing  object.  In  Figure  5.2,  (a)  represents  the  virtual  image  although  it  is 
obtained  using  a CCD  camera,  and  (b)  is  the  real  image,  (c)  and  (d)  are  the  results  of  the 
image  registration,  which  show  that  there  is  a scaling  factor  of  0.7  and  no  rotation  and 
translation  happen.  The  pure  scaling  indicates  that  the  virtual  camera  moves  along  the  axis 
perpendicular  to  the  image  plane  with  respect  to  the  real  image.  Using  simple  triangulation 
it  can  be  calculated  that  this  translation  is  100cm.  Relocating  the  virtual  camera  to  the 
correct  place  and  rendering  another  image  results  in  (e).  Subtracting  (b)  from  (e)  generates 
(f),  where  a pipe  appears  as  a missing  object  in  the  real  world. 

It  should  be  pointed  out  in  this  example  that  the  virtual  images  in  Figure  5.2  (a) 
and  (e)  are  actually  obtained  using  real  sensors.  This  is  because  the  real  image  rectification 
and  the  virtual  image  rendering  modules  are  still  under  development.  Then  the  question 
may  arise  as  to  whether  or  not  the  virtual  image  can  be  used  for  registration  with  the  real 
image.  Figure  5.3  illustrates  an  example  of  the  virtual/real  image  registration.  In  Figure 
5.3,  (a)  is  a virtual  image  acquired  from  the  database  generated  by  IGRIP.  And  (b)  is  a 


real  image 
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(a)  The  virtual  image  from  database 


i 


(c)  The  scaling  and  rotation  detection 
result 


(e)  Image  from  relocated  virtual 
camera 


(b)  The  acquired  real  camera  image 


(d)  The  translation  detection  result 


(f)  The  detected  missing  pipe  by 
subtraction 


Figure  5.2.  Virtual/Real  Camera  Image  Registration 
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(a)  A virtual  image  acquired  from  a 
database  made  by  IGRIP. 
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(b)  A real  image  acquired  by  a CCD 
camera. 


(c)  The  result  of  virtual/real  image 
registration,  where  the  bright  dot  is 
located  at  the  center  of  image  indicates 
that  there  is  no  relative  movement 
between  to  images. 


Figure  5.3.  Virtual/real  image 
registration.  This  example  demonstrates 
that  virtual/real  image  registration  is 
feasible  regardless  small  differences 
between  two  images,  such  as  some  details 
in  the  real  image  and  artificial  illumination 
in  the  virtual  image.  The  author  is  grateful 
to  Dean  Haddox  for  kindly  providing 
Figure  5.3  (a)  and  for  building  the 
physical  model  shown  in  Figure  5.2  and 
5.3. 
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captured  with  a CCD  camera.  Both  (a)  and  (b)  are  blurred  using  a 5x5  averaging  mask  to 
eliminate  small  modeling  error,  (c)  shows  the  result  of  the  registration.  A bright  dot  at  the 
center  indicates  that  the  registration  program  works  very  well. 

5.2  Imagery-Based  Virtual  Reality 

Virtual  reality  is  becoming  increasingly  important  as  a tool  to  provide  cost 
effective  alternatives  for  training  and  to  provide  enhanced  capabilities  for  activities  such  as 
mission  planning  and  mission  rehearsal.  The  ability  to  generate  virtual  reality  utilizing 
photo  databases  created  by  remote  sensing  is  of  particular  interest,  because  these 
databases  are  well  established  and  cover  vast  land  masses  throughout  the  world.  This 
ability  will  extend  applications  of  the  photo  databases  from  high  altitude  to  low  altitude. 
One  example  of  such  applications  is  the  simulation  of  helicopter  operations.  The  chief 
requirements  for  imagery -based  virtual  reality  include  1 ) the  ability  to  fly  through  digital 
3D  databases  comprised  of  realistic  3D  representations  of  both  natural  and  cultural 
features,  2)  automatic  extraction  of  the  information  from  digital  imagery  without  manual 
intervention,  and  3)  a fast  processing  speed. 

In  this  section  an  innovative  approach  to  the  problem  will  be  proposed.  This 
approach  builds  a model  database  with  a CAD  tool  and  uses  the  pattern  recognition 
method  in  this  dissertation  to  select  the  correct  models  for  inserting  them  into  the  terrain 
elevation  data.  The  terrain  elevation  data  can  be  derived  from  stereo  photogrammetry  or 
obtained  from  the  geographical  database. 
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5.2.1.  Introduction 

The  main  challenge  to  generate  virtual  reality  from  the  satellite  and  aerial  imagery 
is  the  difficulty  of  reconstruction  of  3D  models  in  complex  urban  areas.  Although  there  are 
methods  such  as  factorization148'491  to  recover  a perspective  view  of  3D  structures  without 
computing  depth  as  an  intermediate  step,  a true  3D  scene  reconstructed  from  a stereo 
image  pair  is  more  desirable  for  virtual  reality  applications.  This  is  because  it  has  the 
potential  for  rendering  high  quality  3D  features  and  displaying  some  special  viewing 
effects  such  as  fly-through  and  multiple  perspective  views.  Unfortunately,  recent  research 
has  determined  that  the  use  of  stereo  disparity  alone  will  not  reliably  detect  and  delineate 
cultural  structures.1501  More  complicated  methods  for  the  fusion  of  monocular  cues  and 
stereo  disparities  have  been  investigated.1511  However,  despite  considerable  research 
efforts  from  both  the  computer  vision  and  photogrammetric  communities,  current 
techniques  to  extrude  the  features  which  give  the  terrain  a realistic  appearance  when  it  is 
viewed  from  ground  and  near  ground  level  requires  significant  amounts  of  computing  time 
even  for  simple  cases. 

Moreover,  it  is  argued  that  the  current  technical  effort  in  terrain  visualization 
depending  exclusively  on  digital  imagery  may  not  be  adequate,1521  because  there  are  some 
inevitable  difficulties  in  image-based  terrain  visualization,  such  as  occlusion,  distortion  and 
low  resolution.  These  difficulties  often  lead  to  misleading  visualization  of  the  sides  of 
buildings  or  trees,  the  height  or  depth  of  cultural  features,  or  what  is  underneath  trees  or 
other  camouflage.  The  alternative  to  a pure  photo-based  method  is  the  insertion  of  the 
synthesized  features  from  a database  of  3D  models  into  the  scenes  comprised  of  image 
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data  textured  onto  terrain  elevation  values. [52]  However,  the  necessity  of  extracting 
features  manually  in  this  method1501  results  in  its  failure  to  meet  the  automation 
requirement  in  the  feature  extrusion  application. 

Comparing  a variety  of  different  techniques,  it  can  be  concluded  that  the  insertion 
of  synthesized  features  into  a 3D  scene  is  a promising  scheme.  This  scheme  constructs 
sharp  cultural  features  by  the  maximal  utilization  of  the  a priori  knowledge  and  computer 
graphics,  instead  of  imposing  an  impractical  requirement  to  stereo  photogrammetry.  If  an 
automated  pattern  recognition  method  is  developed,  this  scheme  has  the  potential  to 
provide  high  fidelity,  full  automation,  high  speed,  and  geometrical  accuracy  to  feature 
extrusion.  The  pattern  recognition  method  in  this  dissertation  provides  a solution  to 
automatically  identify  the  interest  features  from  imagery.  Together  with  a fast  algorithm 
for  recovering  elevation  data  from  the  imagery,  a model-based  feature  extrusion  scheme 
can  be  developed  to  reconstruct  a 3D  scene  using  both  the  photo  database  and  the 
synthesized  feature  database.  This  fully  automated  approach  has  the  potential  to  provide 
efficiency  and  high  fidelity,  and  increase  the  throughput  significantly. 

In  this  research,  the  open  terrain,  natural  features,  and  other  unknown  cultural 
features  which  comprise  the  background  of  a 3D  scene  are  reconstructed  using  an  area- 
based  correlation  method  for  stereo  correspondence.  The  extrusion  of  known  buildings 
and  cultural  features  is  accomplished  by  inserting  3D  representations  of  pre-modeled 
features  into  the  3D  background  scene  reconstructed  from  stereo  correspondence.  The 
selection  of  feature  models  and  determination  of  their  locations  and  orientations  are 


achieved  by  the  pattern  recognition  method. 
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In  contrast  to  the  results  reported  from  previous  work,151’53'561  the  proposed  system 
will  provide  a 3D  scene  comprised  of  not  only  high  resolution  imagery  draped  over  true 
terrain  elevation  data,  but  also  geometrically  accurate  3D  representations  with  all  known 
textures  and  details.  The  employment  of  non-feature-based  object  recognition  and  stereo 
correspondence  methods  allows  full  automation  of  the  system.  The  multiscale  techniques 
and  the  parallel  data  structures  of  the  underlying  algorithms  provide  the  advantages  of 
utilizing  multiple  processors,  therefore,  significantly  increasing  the  throughput. 

The  philosophical  basis  of  the  proposed  approach  is  full  utilization  of  the  a priori 
knowledge  which  includes  image  geometry  and  the  information  from  the  features  of 
interest.  Image  geometry  is  applied  using  epipolar  geometrical  constraints.  The  feature 
information  is  extracted  by  pattern  recognition.  This  approach  is  superior  to  previous 
work  using  symbolic  knowledge  representations  to  generate  building  hypotheses,151’53'541 
which  are  often  very  complicated,  inadequate,  and  difficult  for  automation.  Taking 
advantages  of  the  latest  developments  in  pattern  recognition,  computer  graphics,  and 
photogrammetry,  this  approach  should  be  able  to  significantly  increase  the  throughput, 
fidelity,  and  degree  of  automation  in  the  process  of  feature  extrusion. 

The  rest  of  the  paper  is  organized  as  follows:  Section  5.2.2  gives  an  overview  of 
the  system;  Section  5.2.3  describes  the  image  registration  module;  Section  5.2.4  illustrates 
the  pattern  recognition  module;  Section  5.2.5  introduces  the  stereo  matching  module,  and 
Section  5.2.6  presents  the  modeling  and  rendering  module.  Finally  in  Section  5.2.7,  an 
experimental  example  is  shown,  which  demonstrates  the  effectiveness  of  the  proposed 
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5.2.2.  The  Proposed  Approach 

Using  the  object  recognition  method  as  a core,  a new  feature  extrusion  and  terrain 
visualization  system  is  proposed.  This  system  includes  five  computational  modules:  image 
registration,  3D  feature  database  modeling,  pattern  recognition,  stereo  correspondence, 
and  3D  scene  rendering.  Figure  5.4  illustrates  the  overall  system  and  the  feature  extrusion 
procedure.  In  this  procedure,  the  stereo  image  pair  and  the  3D  model  database  are  the 
input.  The  assumptions  about  the  stereo  images  are  that  the  interior  calibration  has  been 


3D  feature  Models 


Figure  5.4.  The  feature  extrusion  procedure 
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performed  to  satisfy  the  ideal  pin-hole  camera  model,  and  that  one  image  has  a known 
relationship  to  the  ground.  The  image  registration  determines  the  relative  3D  position  and 
angular  attitude  of  the  left  and  right  images  with  respect  to  each  other.  Using  this  position 
information  the  images  are  digitally  resampled  to  conform  to  the  epipolar  geometry 
constraint.  The  pattern  recognition  module  takes  the  resampled  stereo  pair  as  the  scene 
and  the  images  generated  from  the  3D  models  as  a template  to  identify  features  present  in 
the  scene  and  to  determine  their  coordinates  and  orientations.  The  stereo  matching  module 
generates  the  terrain  elevation  data.  Finally,  the  3D  scene  rendering  module  drapes  the 
textured  imagery  over  elevation  data  and  inserts  the  3D  features. 

5.2.3.  Image  Registration 

Image  registration  is  a key  initial  step  in  many  tasks  involving  the  automated 
interpretation  of  satellite  and  aerial  images.  The  function  of  image  registration  is  to 
identify  the  3D  position  of  each  camera  at  the  moment  of  exposure  and  the  angular 
relationship  between  the  cameras.  The  camera  position  determines  the  relations  between 
the  image  points  and  ground  points  in  the  scene.  The  position  relationship  between 
cameras  allows  the  application  of  the  epipolar  geometry  constraint  which  reduces  the 
search  for  stereo  correspondence  from  2D  to  ID. 

The  most  common  method  of  establishing  the  relative  position  and  orientation 
between  two  images  is  to  select  pairs  of  corresponding  control  points  such  as  landmarks  in 
the  two  images.  However,  general  landmark  matching  is  an  unsolved  problem,1531  and  the 
approaches  using  different  features  for  registration  are  not  only  very  complicated,  but  also 
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inevitably  need  operator  intervention  to  eliminate  false  control  features  such  as  moving 
buses  and  cars.1531  Another  more  promising  approach  is  correlation-based,  which  is  often 
implemented  using  the  Fourier  transformation.  In  contrast  to  other  image  registration 
methods,  the  Fourier  transformation  method  is  characterized  by  an  outstanding  robustness 
against  correlated  noise  and  disturbances,  such  as  those  encountered  with  occlusion  and 
nonuniform,  time  varying  illumination.118  34  371  In  addition,  the  method  is  stable  for  as  little 
as  44%  of  the  overlap  region,1571  and  can  be  implemented  as  a fully  automated  method. 

As  described  in  Section  3.3,  the  Fourier  transformation  registration  method  is 
basically  an  image  auto-correlation  computation.  An  image  registration  problem  can  be 
converted  to  pattern  recognition  by  viewing  the  first  image  frame  as  the  input  and  the 
second  frame  as  the  template.  Therefore,  using  the  pattern  recognition  method  in  this 
dissertation,  stereo  images  can  be  registered  when  there  are  relative  translations,  rotation 
and  scaling  changes.  It  should  be  pointed  out,  however,  that  the  method  in  this 
dissertation  can  not  yet  deal  with  the  off-plane  rotation.  The  effort  to  extend  the  method 
to  camera  movement  with  the  six  degrees  of  freedom  is  underway. 

5.2.4.  Pattern  Recognition 

The  pattern  recognition  module  provides  the  information  about  what  objects  are 
present  in  the  scene  and  where  the  objects  are  located.  One  advantage  of  this  pattern 
recognition  method  is  its  effectiveness  for  objects  such  as  tanks  and  airplanes,  which  is 
particularly  helpful  to  feature  extrusion  because  the  reconstruction  of  these  objects 
exclusively  using  stereo  vision  is  very  difficult  due  to  their  small  sizes  and  complicated 
shapes. 
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The  procedure  using  pattern  recognition  to  identify  the  objects  together  with  their 
locations  and  orientations  is  illustrated  in  Figure  5.5.  Given  a pair  of  stereo  images  and  a 
template  of  the  interest  object,  performing  the  pattern  recognition  to  the  left  and  right 
images  respectively  results  in  a set  of  orientation  and  location  information  for  each  image. 
Calculating  the  disparities  of  the  locations  in  the  left  and  right  images,  the  3D  positions  of 
the  objects  can  be  identified  using  triangulation.  Combined  with  the  orientation,  the 
information  about  what,  where,  and  how  the  objects  should  be  inserted  in  the  3D  world 
coordinate  system  is  obtained.  In  this  procedure,  stereo  matching  is  fairly  easy  because  the 
centers  of  the  recognized  objects  are  well  separated. 


Figure  5.5.  Pattern  Recognition  Procedure 
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5.2.5.  Stereo  Correspondence 

The  function  of  the  stereo  correspondence  module  is  the  calculation  of  the 
binocular  disparity  for  recovering  the  elevation  data  from  the  scene.  There  are  basically 
two  kinds  of  methods  for  stereo  matching:  area-based  and  feature-based  approaches.  The 
feature-based  approach  has  become  popular  since  the  mid-1980’s.  The  primary  advantage 
of  this  approach  is  that  it  can  significantly  reduce  the  amount  of  information  needed  by 
processing  the  features,  i.e.  edges.  However,  the  3D  information  provided  using  this 
method  is  sparse  and  inaccurate.  Particularly  for  imagery  such  as  open  terrain  that  does 
not  contain  enough  features,  the  feature-based  approach  cannot  precisely  describe  the 
terrain.  Moreover,  to  date  there  still  is  no  reliable  method  for  feature-based  disparity 
matching.1581 

The  alternative  to  the  feature-based  method  is  an  area-based  approach.  Compared 
with  the  edge-based  method,  this  approach  provides  a dense  depth  map  and  can  be  used 
for  identifying  smooth  surfaces.  This  approach  can  be  made  faster  by  using  multiscale  and 
parallel  implementations.  It  is  also  very  robust  with  respect  to  noise  and  illumination 
changes.  While  this  approach  performs  well  on  smooth  surfaces,  it  gives  very  poor  results 
near  depth  boundaries.1591  However,  this  disadvantage  is  not  of  concern  in  this  research 
because  the  task  of  reconstructing  buildings  where  boundaries  are  critical  is  avoided  by 
inserting  the  pre-modeled  features. 

The  stereo  matching  method  in  this  research  is  the  least  square  optimization  that  is 
similar  to  Matthies’  work.1601  One  practical  concern  for  the  area-based  approach  is  the  size 
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of  the  correlation  window.161"631  In  this  research  a size  of  10x10  pixels  is  selected  for 
reducing  the  computation  time.  The  matching  pair  is  determined  using  the  normalized 
cross-correlation.  Once  the  disparity  map  is  obtained,  the  terrain  elevation  data  can  be 
calculated  using  space  triangulation. [64] 

5.2.6.  Database  Modeling  And  Realistic  3D  Feature  Rendering 

The  goal  of  the  feature  modeling  module  is  to  build  a database  for  features  of 
interest.  The  database  will  provide  templates  for  pattern  recognition  and  prepare  3D 
models  for  the  feature  insertion.  The  basic  functions  of  the  feature  modeling  module 
include  generating  gray  scale  images  of  multiple  perspective  views  for  an  object,  and 
providing  detailed  3D  models  in  the  form  of  a file  for  feature  insertion.  This  research 
employs  IGRIP  as  the  platform  for  modeling.  The  advantages  of  using  IGRIP  include  easy 
use,  compatibility  with  the  3D  scene  rendering  module,  and  realistic  texture  and  color. 

The  feature  rendering  module  is  for  the  reconstruction  of  a realistic  3D  scene  by 
draping  a high  resolution  image  onto  the  terrain  elevation  data  and  then  inserting  the 
present  features  into  the  scene.  The  software  platform  for  this  module  is  RenderMan™ 
produced  by  Pixar.  This  module  accomplishes  rendering  in  two  steps.  First,  it  builds  a 3D 
mesh  of  the  terrain  elevation  data  and  then  drapes  the  gray  scale  values  of  the  original 
image  onto  the  mesh  to  form  a terrain  model.  Second,  it  converts  an  IGRIP  model  to  a 
RIB  (RenderMan  Interface  Bytestream)  file  format  acceptable  by  RenderMan,  and  then 
combines  the  object  model  with  the  terrain  model  to  finally  generate  a 3D  scene. 
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5.2.7.  Experimental  Example 

This  section  will  present  an  example  that  demonstrates  how  the  proposed  scheme 
works.  Figure  5.6  is  a stereo  pair  of  aerial  images  of  the  Reitz  Union  area  on  the 
University  of  Florida  campus,  where  there  is  a building  that  is  difficult  to  extrude  using  the 
area-based  stereo  photogrammetry  techniques.  The  first  step  of  feature  extrusion  is  image 
registration.  In  this  example,  a traditional  least-square  correlation  method  is  used,  and  12 
ground  control  points  are  established  by  applying  a aerial  triangulation  technique.  Through 
this  photogrammetric  procedure,  the  interior  orientation  and  relative  orientation  of  the 
camera  model  are  established. 

The  building  in  the  images  is  the  object  of  interest  that  is  pre-modeled  using  IGRIP. 
Figure  5.7  shows  two  different  views  of  the  model.  The  model  contains  the  brick  texture 
of  the  wall,  which  is  information  difficult  to  obtain  from  the  original  imagery  due  to  the 
top  view  point. 

The  recognition  result  is  shown  in  Figure  5.8.  Figure  5.8(a)  is  the  template  generated 
using  the  same  model  as  Figure  5.7,  but  viewed  from  the  top  view  point  which  was  the 
camera  exposure  position.  Figure  5.8(b)  is  the  result  of  displacement  detection  for  the  left 
image  in  Figure  5.6,  where  the  brightest  spot  signifies  the  center  of  the  building.  The 
displacement  from  the  spot  to  the  image  center  is  (16,  13)  in  terms  of  the  pixel  value  in 


the  image  coordinate  system. 


Figure  5.6.  Stereo  image  pair  of  Reitz  Union.  The  author  is  grateful  to  Yishuo  Huang  for  kindly  providing  this  Figure. 
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Figure  5.7.  The  feature  model  viewed  from  different  view  points 
The  author  is  grateful  to  Dean  Haddox  for  generating  this  figure 
and  Figure  5.8  (a). 
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(a)  An  IGRIP  generated  template. 


(b)  The  recognition  result  for  the  left  image  in 
Figure  5.6. 


Figure  5.8.  Pattern  recognition  using  a template  in  the  database. 
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Similarly  pattern  recognition  should  be  applied  to  the  right  image  in  Figure  5.8  using 
the  same  technique  and  the  coordinates  of  the  building  center  can  be  obtained.  Applying 
tri angulation,  the  center  of  the  building  in  the  world  coordinate  system  will  be  calculated. 
For  the  result  about  stereo  correspondence  and  terrain  model,  the  readers  are  referred  to 
the  reference.1651 


5.3.  Discussions 

In  this  chapter  it  has  been  shown  how  to  utilize  the  proposed  recognition  method  for 
the  real-world  computer  vision  problems.  Although  the  preliminary  results  are 
encouraging,  a considerable  amount  of  work  needs  to  be  done  to  make  the  systems  work 
as  proposed. 

First,  the  pattern  recognition  method  has  to  be  extended  to  3D  in  order  to  detect  the 
off-plane  rotation  parameters.  Particularly  for  robotic  applications  where  3D  sensory 
information  is  necessary,  whether  or  not  this  method  can  be  extended  to  3D  determines 
its  usefulness  in  this  most  exciting  application  area. 

Next,  to  make  a deliverable  system,  many  problems  other  than  pattern  recognition 
must  be  solved,  such  as  camera  calibration,  real  image  rectification,  object  reconstruction, 
geometric  primitive  recognition,  stereo  correspondence,  and  automatic  modeling.  With  the 
current  technology  most  of  the  above  problems  are  solved  only  in  some  limited  cases, 
therefore,  consistent  team  work  must  be  pursued  to  finally  produce  a practical  system. 


CHAPTER  6 
CONCLUSIONS 


The  four-dimensional  hyper-image  space  combined  with  the  measurement  function 
defined  by  phase-only  filtering  provides  a new  concept  for  object  recognition.  The 
recognition  scheme  presented  in  this  dissertation  is  mathematically  well  defined,  which 
guarantees  the  success  of  recognition  within  the  scope  of  planar  distortions.  The 
implementation  of  the  described  scheme  is  very  simple.  Utilization  of  a parallel  computer 
together  with  a multiresolution  technique  improves  the  processing  speed  significantly.  The 
advantages  of  using  this  method  include  (1)  recognition  of  objects  with  arbitrary  shape  and 
form;  (2)  detection  of  the  movement  parameters  while  recognizing  objects;  (3) 
insensitivity  to  occlusion,  noise  and  luminance  changes;  and  (4)  fast  processing  speed.  The 
fast  speed,  robustness,  and  ability  to  process  raw  data  make  the  recognition  scheme 
valuable  to  a broad  spectrum  of  applications.  Two  direct  applications  of  the  scheme  are 
autonomous  object  recognition  and  image  registration.  Experiments  on  many  images  show 
robust  results. 

With  the  development  of  this  method,  computer  graphic  modeling  can  be 
integrated  into  computer  vision  systems,  which  provides  a powerful  way  to  utilize  a priori 
information.  The  proposed  systems  for  virtual  world  updating  and  imagery-based  virtual 
reality  demonstrated  the  potential  of  the  method.  This  research  opens  a new  avenue  to 
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many  traditional  computer  vision  problems.  It  will  renew  research  interests  in  the 
correlational  recognition  method. 

Although  the  proposed  scheme  seems  very  promising,  further  investigation  to 
extend  it  to  off-plane  rotations  must  be  carried  out.  Only  if  a method  is  able  to  deal  with 
the  off-plane  rotation  problem  for  at  least  the  near-planar  objects,  will  the  method  have  a 
chance  to  lead  towards  a machine  vision  system  which  resembles  the  human  vision  system. 


APPENDIXES 


A.  1 More  Experimental  Results  For  Object  Recognition 
Appendix  A.  1 presents  more  experimental  results  for  object  recognition.  These 
results  in  particular  that  on  the  peak  sharpness,  have  been  used  to  evaluate  the 
effectiveness  of  this  concept.  There  are  six  figures  where  the  objects  exist  in  the  first  two 
and  are  absent  in  the  rest.  In  all  figures,  (a)  and  (b)  are  in  a size  of  512  x 512  pixels  and 
(c)  and  (d)  illustrate  the  results  at  the  level  of  64  x 64  pixels,  although  their  display  sizes 
are  all  the  same.  The  coordinate  systems  in  these  examples  are  as  same  as  described  in 
Chapter  4.  In  the  cases  where  objects  are  not  present,  translation  detection  results  should 
be  ignored  because  they  do  not  have  any  geometrical  meaning. 

It  is  interesting  to  compare  Figure  A.2  with  A. 3.  In  these  figures,  the  input  scene  is 
the  same  except  in  Figure  A.3(a)  the  object  is  taken  away.  The  peak  sharpness  value 
jumps  significantly  from  20.950  when  the  object  is  present  to  25.419  when  the  object  is 
absent.  This  example  demonstrates  the  effectiveness  of  the  peak  sharpness  as  a criterion  to 
judge  the  existence  of  an  object. 
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(c)  Rotation  and  scaling  (d)  Translation  detection 

detection 


Peak  Sharpness:  PCE  = 20.023, 
Rotation:  0 = 0, 
Scaling:  k = 1 , 

Translations:  jc  = 12,  y = -14. 


Figure  A.l. 


85 


(a)  Scene  image 


, i 

I 

HP 

(c)  Rotation  and  scaling 
detection 

(d)  Translation  detection 


Peak  Sharpness:  PCE  = 20.950, 
Rotation:  0 = 0, 
Scaling:  k = 1 , 
Translations:  x = 1 , y = 3. 


Figure  A.2. 
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(a) Input  scene 


(b)  Template 


(c)  Rotation  and  scaling  detection 


Peak  Sharpness:  PCE  =25.419 


Figure  A.3. 
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(a)  Input  Scene 


(b)  Template 


(c)  rotation  and  scaling  detection 


(d)  translation  detection 


Peak  Sharpness:  PCE  = 27.664 


Figure  A.4. 
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(a) Input  scene 


(c)  Rotation  and  scaling  detection 


_ ■ 


(d)  Translation  detection 


Peak  Sharpness:  PCE  = 29.550 


Figure  A.5. 
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(a) Input  scene 


(b)  Template 


(c)  rotation  and  scaling  detection 


Peak  Sharpness:  PCE  = 25.200 


Figure  A.6. 
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A.2  C Codes  for  the  Implementation  of  the  Recognition  Method 
This  section  provides  C programs  to  show  how  the  recognition  method  is 
implemented.  For  the  readers’  convenience,  each  program  is  completed  with  the  major 
computing  subroutines  as  well  as  standard  input  and  output  subroutines.  Many  subroutines 
listed  here  are  either  handouts  or  homework  from  my  Computer  Vision  class  taught  by  Dr. 
Andrew  Laine.  The  Read/Write  image  subroutines  are  written  by  Patric  Wunsch.  The 
author  is  very  grateful  to  them. 


A.2.1  Test  image  generation 


This  program  generate  a square  image  similar  to  Figure  4.1(a).  This  image  is 
helpful  for  the  readers  to  test  their  own  implementation.  The  input  to  the  program  is  the 

image  size,  and  the  output  is  a raw  image. 

/*  Make  a square  image  */ 

♦include  <fcntl.h> 

#include  <stdio.h> 

♦include  <math.h> 

unsigned  char  *A; 
float  dy,  dx,  h; 

♦define  WHITE  250; 

♦define  BLACK  0;  /*  Constants  for  brightness  */ 

/*  Usage:  make_sqr  [size]  image. out  */ 

main (argc, argv) 
int  argc; 
char  *argv [ ] ; 

{ 

int  i,  j,fdl; 
int  N; 

fdl  = open (argv [2 ] , 0_WR0NLY | 0_CREAT) ; 

N = atoi (argv [1] ) ; /*  Size  of  square  image  */ 

A = (unsigned  char  *)  (malloc ( ( (N) ) * ( (N) ) ) ) ; 

for  (i=100;  i<200;  i++)  { 

for  ( j=100;  j<  (200) ; j++)  { 
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} 


} 


* (A+(i* (N) )+j)  = WHITE; 


write (fdl.  A,  N*N) ; 


A.2.2  Basic  matching 


This  program  is  written  according  to  the  algorithm  described  in  Section  3.1.  The 
input  to  the  program  include  the  image  size,  scene,  and  template.  Both  the  scene  and 
template  are  in  the  format  of  a float  binary  file.  The  output  includes  an  image  for  rotation 
and  scaling  detection  and  an  image  for  translation  detection.  The  former  is  called 
“mushroom”,  which  is  a float  complex  binary  file.  The  latter  is  called  “field”,  which  is  a 
float  binary  file. 

/★★*★***★★***★★*★*★********★********★**************★★***★*★★********* 
The  template  matcher. 

Usage:  mushgnt  <2dsize>  <2dsize>  <2dsize>  input  "mushroom"  kernal 

September  28,  1994 

Sharon  Wang 
CIMAR,  UF. 

**★★***★*★**★**********★★★★*************************************.******/ 


♦include  <stdio.h> 

♦include  <strings.h> 

♦include  <sys/file.h> 

♦include  <math.h> 

void  fft (float**,  float**,  int,  int,  int,  int); 

void  logTransfer (void) ; 

void  polarTransfer (int,  int); 

float  bilinearltrpl (int,  int,  float,  float); 

void  usage (void) ; 

void  get_parameters ( int,  char*  argv[]  ) ; 
void  **array2(  int,  int,  int  ); 
void  read_image ( char*,  int,  float**  ) ; 
void  write_image ( char*,  int,  float**); 

int  N=32;  /*  size  of  the  original  image  */ 

int  Ntheta=32;  /*  angle  size  of  the  resulted  image  */ 

int  Nr=32;  /*  radius  size  of  the  resulted  image  */ 

char*  im_file; 
char*  out_file; 
char*  ker  file; 
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int  fd; 

float  **carte,  **polar,  **plog; 
float  dtheta; 
double  dr; 

int  bpp;  /*  bytes  per  pixel  */ 

void  main ( int  argc,  char*  argv[]  ) 

{ 

float  **xr,  **xi,  **kerxr,  **kerxi,  **field,  **hold; 
int  xsize  = 32,  ysize  = 32; 
int  MFLAG  = 0; 
int  IFLAG  = 0; 
int  nchan  =1; 
int  i,  j,k,  1; 
float  *mushroom, *pout ; 
float  temp, real, imag, kreal, kimag, maxd; 
int  max_i,  max_j; 

get_parameters ( argc,  argv  ) ; 

ysize  =N; 

xsize=N; 


bpp  = sizeof (float) ; 

if(  ! (carte=  (float**)  array2 (N,  N,  sizeof (float) ))  || 

! (polar= (float**)  array2 (Nr,  Ntheta,  sizeof (float) )) || 

! (plog= (float**)  array2 (Nr,  Ntheta,  sizeof (float) ) ) || 

! (xi  = (float  **) array2 (xsize,  xsize,  bpp) ) || 

! (xr  = (float  **) array2 (xsize, xsize, bpp) ) || 

! (kerxi  = (float  **) array2 (xsize, xsize, bpp) ) || 

! (kerxr  = (float  **) array2 (xsize, xsize, bpp) ) || 
'.(field  = (float  **) array2 (xsize, xsize, bpp) ) jj 
! (hold  = (float  **) array2 (xsize, xsize, bpp) ) |j 
[(mushroom  = (float  *)malloc (bpp*xsize*xsize*2) ) ) 
{ fprintf(  stderr,  "can't  allocate  image  buffers\n"); 
exit  (1) ; } 


/*  Generate  the  kernal  mushroom  */ 


*/ 


read_image ( ker_file,  N,  carte); 

polarTransfer (N/2,N/2) ; /*  set  the  point  at  the  center 

logTransfer () ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kerxr [k] [l]=plog[k] [1]; 

} 


fft (kerxr,  kerxi,  IFLAG,  MFLAG,  xsize,  nchan);  /*  fft  */ 
/*  loop  of  check  all  mushrooms  */ 

read_image ( im_file,  N,  carte); 
pout=mushroom  ; 

for  (i=0;  i<N;  i++)  /*  loop  of  field  */ 

for  ( j =0 ; j<N;  j++)  { 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

polar [k] [ 1 ] =0 ; 
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} 


plog [k] [1] =0; 


polarTransfer (i, j) ; /*  make  a mushroom  */ 

logTransfer () ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

xr [k]  [1] =plog [k]  [1] ; 

} 

IFLAG=0;  nchan=l; 

f ft (xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan) ; /*  fft  */ 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kreal=kerxr [k] [1]; 
kimag=kerxi [k]  [ 1 ] ; 
real=xr (k]  [1] ; 
imag=xi [k] [1] ; 

temp  = sqrt (kreal*kreal  + kimag  * kimag) ; 

if  (temp>0 . 00001)  { /*  complex  division  */ 

xr [k] [1] = (real*kreal+imag*kimag) /temp; 
xi[k] [1]= (imag*kreal-real*kimag) /temp; } 
else  { 

xr[k] [1] = 0; 
xi[k] [1] = 0; } 

} 

IFLAG=1 ; nchan=2; 

fft (xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan);  /*  ifft  */ 

/*  get  the  max  for  whole  field  */ 
maxd=0; 

for  (k  = 0;  k<N;  k++) 
for  (1  = 0;  1<N;  1++) 

( 

temp  = (xr[k] [l]*xr[k] [l]+xi[k] [l]*xi[k] [1] ) ; 
hold [k] [1] =temp; 

if  (temp>maxd) 
maxd=temp; 

} 

field [i] [j]=maxd; 
if (i==32  &&  j==32) 
write_image ("mushroom2",  N,  hold) ; 

} /*  end  of  the  main  loop  */ 

/*  Pick  up  the  mushroom  with  the  max  value  */ 

maxd=0; 

for  (k  = 0;  k<N;  k++) 
for  (1  = 0;  1<N;  1++) 

{ 

if  (field [k] [1] >maxd)  { 
maxd=field[k] [1] ; 
max_i=k; 
max_j=l;  } 


for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

polar [k] [1] =0; 
plog [k] [1] =0; 

} 
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polarTransf er (max_i,  max_j ) ; 
logTransfer ( ) ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

xr[k] [1] =plog [k] [1]; 

} 


IFLAG=0;  nchan=l; 

f ft (xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan) ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kreal=kerxr [k] [1]; 
kimag=kerxi [k]  [1]; 
real=xr [k] [1] ; 
imag=xi [k]  [1] ; 

temp  =sqrt (kreal  * kreal  + kimag  * kimag) ; 

if  (temp>0 . 00001)  { /*  complex  division  */ 

xr [k] [1] = (real*kreal+imag*kimag) /temp; 

xi[k] [1] = (imag*kreal-real*kimag) /temp;  } 

else  { 

xr [k] [1]=  0; 

xi[k] [1]=  0;} 

} 

IFLAG=1 ; nchan=2; 

fft(xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan); 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

*pout++=xr [k] [1]; 

*pout++=xi [k] [lj; 

} 


} 


fd=open (out_file,  0_CREAT  | 0_WR0NLY,  0644); 
write (fd,  (float*) mushroom,  N*N*sizeof (float) *2) ; 
close (fd) ; 

write_image ("field",  N,  field); 


void  logTransfer (void) 

< 

int  k,  1; 
int  nt,  nr; 
float  r,  rx,  tr; 
float  f; 

dr=log ( (double) (N-l) ) / (double)N; 
for  (nr  = 0;  nr<Nr;  nr++)  { 

rx=(float)  exp(  (double) nr*dr) ; /*  coordinates  in  polar  plane  */ 

k=  (int)rx; 

tr  = rx  - (float)  k; 

for  (nt  = 0;  nt<Ntheta;  nt++)  { 

f=polar [k] [nt] +tr* (polar [k+1] [nt] -polar [k] [nt] ) ; 
plog [nr] [nt]=f; 

} 

} 

} 


void  polarTransf er (int  m,  int  n) 

{ 
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int  k,  1;  /*  m, 1 mean  y,  and  n,  k mean  x */ 

int  nt,  nr; 

float  theta,  r,  xx,  yy,  tx,  ty; 

dtheta=2 . 0 *M_P I / ( float )Ntheta; 

for  (nt=0;  nt<Ntheta;  nt++)  { /*  loop  of  theta  */ 

theta  = -M_PI+ (float) nt*dtheta;  /*  range  from  -pi  to  pi  */ 
for  (nr  = 0;  nr<Nr;  nr++)  { 

r= (float) nr/2; 

xx=r*cosf (theta) ; /*  calculate  new  coordinates  */ 

yy=r*sinf (theta) ; 

k=  (int) xx  + n ; /*  index  */ 

1=  (int)yy  + m ; 

tx  = xx  - (float) (int) (xx) ; /*  difference  for  interpolation 

*/ 

ty  = yy  - (float) (int) (yy); 

if  (xx  <0)  { tx  = 1.0+tx;}  /*  for  correct  index  */ 
if  (yy  <0)  { ty  = 1.0+ty;} 
if  (xx>0)  {k=k+l;} 
if  (yy>0)  (1=1+1; } 

if  (k>=l  &&  1>=1  &&  k<N  &&  1<N  ) ( 
polar [nr] [nt] =bilinearltrpl (k, 1, tx, ty) ; ] 


float  bilinearltrpl (int  i,  int  j,  float  rx,  float  ry) 

( 

float  f; 

f=(carte[j] [ i— 1 ] -carte [ j-1]  [i-l])*ry 
+ (carte [j-1] [i] -carte [j-1] [i-l])*rx 
+(carte[j] [i] +carte [ j-1] [ i— 1 ] 

-carte[j-l]  [i]-carte[j]  [ i— 1 ] ) *rx*ry+carte [ j-1]  [i-1]; 
return  f; 

} 


void  usage (void) 

{ 

fprintf(  stderr,  "\tusage:  edges  <size>  <scale>  <image  file>  [i]\n"  ); 
exit ( 1 ) ; 


void  get_parameters ( int  argc,  char*  argv [ ] ) 

( 

if(  argc  < 7 ) usage (); 

N=atoi ( argv[l]  ); 
if ( N<4  | | N>2048  ) 

( fprintf ( stderr,  "invalid  image  size:  %s  \n",  argv[l]  );  usage (); 

Nr=atoi(  argv [2]  ); 

Ntheta=atoi ( argv [3]  ); 
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im_file  = argv[4]; 
out_file  = argv[5]; 
ker_file  = argv[6]; 


/★★★★★★**★★★★★*★★★★★★★★★**★★****★**★*★★★★*★*****★*******★*★★***★*★*★** 
/*  * 

/ * read_image : * 

/*  reads  the  raw  image  (i.e.  8bit  gray  scale,  size  NxN)  "im  file"  * 

/*  and  stores  it  in  floating  point  format  in  buffer  s.  Memory  for  * 

/*  must  be  allocated  beforehand.  * 

/*  * 

/★★★★★★★******★*★★*★★★***★**★*****★*★★**★**★********************★*★★*★ 


/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 


void  read_image ( char*  im_file,  int  N,  float**s  ) 

{ 

int  i,  j,  buf_len; 
float *buf; 

FILE  *f ; 


if ( !(f=fopen(  im_file,  "r"  ))  ) 

{ fprintf(  stderr,  "can't  open  file  %s\n",  im_file  );  exit ( 1 );  } 

buf_len=N*sizeof ( float); 

if(  ! (buf  = (float*)  malloc(  buf_len  ))  ) 

{ fprintf(  stderr,  "can't  allocate  buffers\n"  );  exit ( 1 );  } 

f or ( i=0;  i<N;  i++  ) 

{ 

fread(  buf,  sizeof(  float),  N,  f ); 
for ( j=0;  j<N;  j++  ) { . 
s[i] [j]=  buf [j]; 

} 

} 


f close  ( f ) ; 
free ( buf  ) ; 


void  write_image ( char*  im_file,  int  N,  float**s  ) 

{ 

int  i,  j,  buf_len; 
float *buf ; 

FILE  *f ; 

if(  !(f=fopen(  im_file,  "w"  ))  ) 

{ fprintf(  stderr,  "can't  open  file  %s\n",  im_file  );  exit ( 1 );  } 

buf_len=N*sizeof ( float) ; 

if(  ! (buf  = (float*)  malloc(  buf_len  ))  ) 

( fprintf ( stderr,  "can't  allocate  buffers\n"  );  exit ( 1 );  } 

f or ( i=0;  i<N;  i++  ) 

{ 

for ( j=0;  j<N;  j++  ) 
buf [ j]=s [i] [ j] ; 

fwrite ( buf,  sizeof ( float),  N,  f ) ; 

} 

fclose  ( f ) ; 
free ( buf  ) ; 
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void  **array2(  int  X,  int  Y,  int  elem  size) 

{ 

void  *data; 
void  **  ptr; 
int  row_bytes,  i; 

if ( ! (data  = (void  *)malloc(  X * Y * elem_size  ))  ) 
return ( NULL  ) ; 

if(  ! (ptr  = (void  **)  malloc (sizeof ( void  * ) * X ))  ) 
return ( NULL  ) ; 

row_bytes  = Y * elem_size; 

f or ( i=0;  i<X;  i++  ) 

ptr[i]  = (void  *)((int)data  + i * row_bytes) ; 
return ( ptr  ) ; 


/* 

* fft  - do  fft  Andrew  Laine 


*/ 

♦include  <stdio.h> 

♦include  <strings.h> 

♦include  <sys/file.h> 

♦include  <math.h> 

fft (float  **  xr,  float  **  xi,  int  IFLAG,  int  MFLAG, int  xsize,  int  nchan) 

int  a,  p,  q,  i,  j,  inc,  len,  im; 

float  *buf,  *pbuf,  *co,  *si,  nsqr; 

int  *butterfly; 

int  bpp  = sizeof (float) ; 

if  ( ( (buf  = (float  *) (malloc (bpp* (len=xsize) *2) ) ) ==  0)  || 

((co  = (float  *) (malloc (bpp*len) ) ) ==  0)  || 

((si  = (float  *) (malloc (bpp*len) ) ) ==  0)  || 

((butterfly  = (int  *)  (malloc (2*xsize*sizeof (int) )) ) ==  0))  { 
printf ("insufficient  memory  for  fft  \n") ; 
exit  (3) ; 

} 

if (IFLAG)  inc  = 1; 

else  inc  = -1; 

im  = 1; 

for  (i  =0;  i < xsize;  i++)  { 
for  (j  =0;  j < xsize;  j++)  { 
xr[i] [j]  = xr [i] [ j] *im; 
if  (nchan  ==  1)  xi[i][j]  = 0.0; 
else  xi [i] [ j]  = xi[i][j]*im; 
if  (! MFLAG)  im  = -im; 

} 

if  ( ! MFLAG)  im  = -im; 

} 

f ft2d (xr, xi, co, si, butterfly, xsize, inc) ; 

nsqr  = xsize  * xsize  ; 

im  = 1; 

for  (i  = 0;  i < xsize;  i++)  { 
if  (IFLAG)  { 
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for  (j  =0;  j < xsize;  j++)  { 
xr [i] [ j]  = xr[i] [j]/nsqr*im; 
xi[i] [j]=  xi [i] [j]/nsqr*im; 
if  ( ! MFLAG)  im  = -im; 

} 

} 

else  { 

for  (j  =0;  j < xsize;  j++)  { 
xr [i] [ j]=  xr [i] [ j] *im; 
xi [i] [ j]=  xi [i] [ j] *im; 
if  (1MFLAG)  im  = -im; 

} 

} 

if  (’MFLAG)  im  = -im; 

} 

} 

f ft2d (xr, xi, co, si, butterfly, n, inc) 

/*  Based  on  the  algorithm  by  E.  Hoyer  and  W.  Berry 

* For  better  performance,  the  tables  of  cosines,  sines  and 

* butterfly  are  precalculated  and  stored. 

* xr  and  xi  are  the  real  and  the  imaginary  parts  of  the  image  to 

* be  transformed. 

* co  and  si  are  the  cosines  and  sines  tables. 

* butterfly  is  the  table  of  butterfly  (bit  reversal) 

* n is  the  dimension  of  the  nxn  image 

* inc  is  1 for  forward  transform  and  -1  for  inverse  transform. 

*/ 

float  **xr,  **xi,  co[],si[]; 

int  n,  inc; 

int  butterfly  [ ] ; 

{ 

register  int  i,  j,r,  k,p; 

int  n2,  nb,  lk.2,  lp2,  ip,  ku,  kl,  k2,  al,  a2,  a3,  a4; 
float  pi, wn, iwn, fn, fr, fine, tempr, tempi; 

float  terml,  term2,  term3,  term4,  term5,  term6,  term7,  term8; 
double  cos (), sin () , log ( ) , floor ( ) , pow ( ) ; 
int  il,i2,c; 
int  brev ( ) ; 

pi  = MJPI; 

fn  = n; 
fine  = inc; 

nb  = floor  ( (double)  (0 .5+ (log  ( (double)  fn) /log  (2 . 0) ))) ; 
n2  = n/2; 

wn  = (-2.0  * pi  * fine)  / fn; 

for  (i  = 0;  i < n;  i++)  { 
iwn  = i*wn; 

co[i]  = cos ( (double) iwn) ; 
si[ij  = sin ( (double) iwn) ; 

} 

for  (r  = 1;  r <=  nb;  r++)  { 
fr  = r; 

ip  = floor (0.5  + pow(2.0,fr  - 1.0)); 
ku  = (int)((n  / 2.0)  / ip); 
for  (i  = 0;  i < ip;  i++)  { 
for  (j  = 0;  j < ip;  j++)  { 
il  = (int)  ( (n*i) /ip) ; 
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for  (k  = 0;  k < ku;  k++)  { 
lk2  = ip*k; 

±2  = (n* j ) / ip; 
for  (p  = 0;  p < ku;  p++)  { 
lp2  = ip*p; 
terml  = xr[il][i2]; 
term2  = xr [il] [i2+ku] ; 
term3  = xr[il+ku]  [ ±2 ] ; 
term4  = xr [il+ku] [i2+ku] ; 
term5  = xi[il][i2]; 
term6  = xi [il] [i2+ku] ; 
term7  = xi [il+ku] [i2] ; 
term8  = xi [il+ku] [i2+ku] ; 

xr[il] [i2]  = (terml  + term2  + term3  + term4) ; 
xi[il] [i2]  = (term5  + term6  + term7  + term8) ; 
tempr  - terml  - term2  + term3  - term4; 
tempi  = term5  - term6  + term7  - term8; 


xr [il] [i2+ku] 

= 

tempr 

★ 

co [lp2 ] 

- tempi  * 

si [lp2] 

xi  [il ] [i2+ku] 

= 

tempr 

★ 

si [ lp2 ] 

+ tempi  * 

co [ lp2 ] 

tempr  = terml 

+ 

term2 

- 

term3  - 

term4; 

tempi  = term5 

+ 

term6 

- 

term7  - 

term8; 

xr [il+ku] [i2] 

= 

tempr 

★ 

co [lk2] 

- tempi  * 

si [lk2] 

xi [il+ku] [i2] 

= 

tempr 

★ 

si [lk2] 

+ tempi  * 

co [lk2] 

tempr  = terml 

- 

term2 

- 

term3  + 

term4; 

tempi  = term5 

term6 

- 

term7  + 

term8; 

xr [il+ku] [i2+ku]  = tempr  * co[lp2+lk2]  - tempi  * 
xi [il+ku] [i2+ku]  = tempr  * si[lp2+lk2]  + tempi  * 
i2++ ; 

} 

il++; 

} 


} 

} 

for  (i  = 0;  i < n;  i++) 

butterfly [i]  = brev(i,nb); 

for  (i  = 0;  i < n;  i++)  { 
kl  = butterfly [i] ; 
for  (r  = 0;  r < n;  r++)  { 
k2  = butterfly [r] ; 
al  = i*n+r; 
a2  = kl*n+r; 
a3  = i*n+k2; 
a4  = kl*n+k2; 
if  (i  < kl)  { 

tempr  = xr [i] [r] ; 
tempi  = xi [i] [r] ; 
xr  [i]  [r]  = xr  [kl]  [k2] ; 
xi  [i]  [r]  = xi  [kl]  [k2] ; 
xr[kl] [k2]  = tempr; 
xi[kl][k2]  = tempi; 

} 

if  ( (i  ==  kl)  & (r  < k2) ) < 
tempr  = xr [i] [r] ; 
tempi  = xi [i] [r] ; 
xr [i]  [r]  = xr  [i]  [k2]  ; 
xi  [i]  [r]  = xi  [i]  [k2]  ; 
xr[i][k2]  = tempr; 
xi [i] [k2]  = tempi; 

} 

} 


si  [Ip2+lk2 ] ; 
co [Ip2+lk2 ] ; 


} 
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brev (w, nb) 
int  w,nb;  { 

register  int  j,k; 
k = 0; 

for  (j=0;  j<nb;  j++) 

k = (k  «1)  | ( (w  » j)  & 0001) ; 

return  (k) ; 


A.2.3  Peak  sharpness  test 

This  program  is  for  the  calculation  of  the  peak  sharpness  value.  The  input  includes 
the  image  size  and  the  subspace  correlation  coefficients,  which  must  be  normalized  to  a 
grayscale  image  in  the  raw  format.  The  output  is  the  peak  sharpness  value  printed  on  the 


screen. 


to  calculate  the  peak  sharpness 
usage:  peak  <size>  infile 
Sharon  Wang 
December  1994 


CIMAR,  UF 

************************************************************************ 

*/ 

♦include  <stdio.h> 

♦include  <strings.h> 

♦include  <math.h> 


void  read_image(  char*,  int,  float  **  ); 
void  **array2 ( int,  int,  int  ) ; 
void  usage (void) ; 

void  get_parameters  ( int,  char*  argv[]  ); 


int  N=64; 
float  n=4.0; 
char*  in_fn; 
char*  out_fn; 

void  main (int  argc,  char*  argv[]) 

1 


/*  image  size  */ 

/*  number  of  max  */ 

/*  image  name  */ 

/*  kernal  image  name  */ 


float  ** image; 
int  i, j ; 

float  maxi, temp, total,  sharpness; 

get_parameters ( argc,  argv  ) ; 
parameters  */ 


/*  check  for  invaild 
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if ( ! (image= (float  **)  array2 ( N,  N,  sizeof ( float  ) ))  ) 

{ fprintf(  stderr,  "can't  allocate  image  buffers\n"  );  exit ( 1 );  } 

read_image ( in_fn,  N,  image  ) ; 

maxi=0;  total=0; 

for (i=0; i<N; i++) 

for ( j=0; j<N; j++) { 
temp= image [ i ] [ j ] ; 
temp  *=  temp; 
if (temp>maxi) 
maxi=temp; 
total  +=  temp; 

} 

sharpness=f loglO (total) -floglO (n*maxi) ; 

printf("peak  sharpness  %f\n",  sharpness); 


/★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★it 
/*  * 
/ * read_image : * 
/*  reads  the  raw  image  (i.e.  8bit  gray  scale,  size  NxN)  "im_file"  * 
/*  and  stores  it  in  floating  point  format  in  buffer  s.  Memory  for  * 
/*  must  be  allocated  beforehand.  * 
/*  * 


/ 

/ 

/ 

/ 

/ 

/ 

/ 

/ 


void  read_image ( char*  im_file,  int  N,  float  **s  ) 

{ 


int  i,  j,  buf_len,  ch_size; 
unsigned  char  *buf; 

FILE  *f ; 


if(  !(f=fopen(  im_file,  "r"  ))  ) 

{ fprintf ( stderr,  "can't  open  file  %s\n”,  im_file  ) ; exit ( 1 );  } 

buf_len=N*sizeof (unsigned  char  ); 

if(  ! (buf  = (unsigned  char  *)  malloc ( buf_len  ))  ) 

{ fprintf ( stderr,  "can't  allocate  buffers\n"  );  exit ( 1 );  } 

for(  i=0;  i<N;  i++  ) 

{ 

fread(  buf,  sizeof (unsigned  char  ),  N,  f ); 
for ( j=0;  j<N;  j++  ) 

s [i] [ j]=(float)  buf [ j] ; 

} 


f close  ( f ) ; 
free ( buf  ) ; 


void  usage () 

{ 

fprintf ( stderr,  "\tusage:  peak  <Img  size>  cimage  file>\n"  ); 
exit  ( 1 ) ; 


void  get_parameters ( int  argc,  char*  argv[]  ) 

1 


if ( argc  < 3 ) usage (); 
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N=atoi ( argv[l]  ); 
if ( N<4  | | N>2048  ) 

{ fprintf(  stderr,  "invalid  image  size:  %s  \n",  argv[l]  );  usage(); 


in_fn=argv [2 ] ; 

} 


/★★★★★***★*★★***★******★★**★*****★**•* -it*******************************  / 


/*  */ 

/*  a r r a y 2 — dynamically  allocates  a two  dimensional  array  */ 

/*  SYNTAX:  */ 

/*  type  **array;  */ 

/*  array  = (type  **)  array2 (Xdim,  Ydim,  sizeof  type);  */ 

/*  */ 

/*  is  equivalent  to:  */ 

/*  type  array [Xdim] [Ydim] ; */ 

/*  where  type  is  any  data  type.  */ 

/*  A NULL  pointer  is  returned  if  the  memory  cannot  be  allocated.  */ 

/*  */ 


/*★★******★★*★**★**★****★★★*★**★*★★*★****★★*★******★★*★★******* ****** ^ 

void  **array2 ( int  X,  int  Y,  int  elem_size) 

1 

void  *data; 
void  **  ptr; 
int  row_bytes,  i; 

if(  ! (data  = (void  *)malloc(  X * Y * elem_size  ))  ) 
return ( NULL  ) ; 

if(  ! (ptr  = (void  **)  malloc (sizeof ( void  * ) * X ))  ) 
return ( NULL  ) ; 

row_bytes  = Y * elem_size; 

f or ( i=0;  i<X;  i++  ) 

ptr[i]  = (void  *)( (int) data  + i * row_bytes) ; 
return ( ptr  ) ; 


A.2.4  Multiresolution  implementation 

This  program  is  for  recognition  in  a specified  level  in  the  scheme  of  multiresolution 
implementation.  The  input  includes  the  image  size,  the  coordinates  of  the  object  in  the 
next  coarse  level,  and  the  scene  and  the  template  images.  The  output  is  the  coordinates  of 
the  object  and  the  rotation  and  scaling  detection.  A subroutine  fft  is  called  here,  which  is 
the  same  as  the  one  listed  in  Section  A.2.2. 


/**★*★*********★********★★★************************** 
Multiscale  Implementation. 
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Usage:  step  <2dsize>  <2dsize>  <2dsize>  X Y input  output  Kernal 

September  28,  1994 

Sharon  Wang 
CIAMR,  UF 

***************************************************** i 


#include  <stdio.h> 
finclude  <strings.h> 

#include  <sys/file.h> 

# include  <math.h> 

void  fft (float**,  float**,  int,  int,  int,  int); 

void  logTransfer (void) ; 

void  polarTransfer (int,  int); 

float  bilinearltrpl (int,  int,  float,  float); 

void  usage (void) ; 

void  get_parameters ( int,  char*  argv[]  ); 
void  **array2 ( int,  int,  int  ); 
void  read_image(  char*,  int,  float**  ); 
void  write_image ( char*,  int,  float**); 

int  N=64;  /*  size  of  the  original  image  */ 

int  Ntheta=64;  /*  angle  size  of  the  resulted  image  */ 

int  Nr=64;  /*  radius  size  of  the  resulted  image  */ 

int  X,  Y; 
char*  im_file; 
char*  out_file; 
char*  ker_file; 
int  fd; 

float  **carte,  **polar,  **plog; 
float  dtheta; 
double  dr; 

int  bpp;  /*  bytes  per  pixel  */ 

void  main(  int  argc,  char*  argv[]  ) 

{ 

float  **xr,  **xi,  **kerxr,  **kerxi,  field[2][2],  **hold; 
int  xsize  = 64,  ysize  = 64; 
int  MFLAG  = 0; 
int  IFLAG  = 0; 
int  nchan  = 1; 
int  i, j, k, 1,  ii, j j ; 
float  *mushroom,  *pout; 
float  temp, real, imag, kreal, kimag, maxd; 
int  max_i,  max_j; 

get_parameters ( argc,  argv  ) ; 

ysize  =N; 

xsize=N; 

bpp  = sizeof (float) ; 

if ( ! (carte= (float**)  array2 (N,  N,  sizeof (float) )) || 

! (polar= (float**)  array2 (Nr,  Ntheta,  sizeof (float) )) | | 
! (plog= (float**)  array2 (Nr,  Ntheta,  sizeof (float) ) ) || 
! (xi  = (float  **) array2 (xsize, xsize, bpp) ) | | 

! (xr  = (float  **) array2 (xsize, xsize, bpp) ) || 

! (kerxi  = (float  **) array2 (xsize, xsize, bpp) ) || 

! (kerxr  = (float  **) array2 (xsize, xsize, bpp) ) || 
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! (hold  = (float  **) array2 (xsize, xsize, bpp) ) || 

! (mushroom  = (float  *)malloc (bpp*xsize*xsize*2) ) ) 

{ fprintf  ( stderr,  "can't  allocate  image  buffers\n"); 
exit  (1) ; } 

/*  Generate  the  kernal  mushroom  */ 

read_image ( ker_file,  N,  carte); 

polarTransfer (N/2,N/2) ; /*  set  the  point  at  the  center 

*/ 

logTransfer () ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kerxr[k] [l]=plog[k] [1]; 

1 

f ft (kerxr,  kerxi,  IF LAG,  MFLAG,  xsize,  nchan) ; /*  fft  */ 

/*  loop  of  check  all  mushrooms  */ 

read_image ( im_file,  N,  carte); 
pout=mushroom  ; 

for  (i=0;  i<2;  i++)  /*  loop  of  field  */ 

for  ( j=0;  j<2 ; j++)  { 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

polar [k] [ 1 ] =0 ; 
plog [k] [1]=0; 

} 

ii=2*Y+l;if  (i==0)  ii=2*Y;  /*  Blowing  up  1 pixel  to  4 */ 

j j=2*X+l; if  ( j==0)  j j=2*X; 

polarTransfer (ii, j j ) ; /*  make  a mushroom  */ 

logTransfer () ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  ( 

xr [k] [1] =plog [k] [1] ; 

} 

IFLAG=0;  nchan=l; 

f ft (xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan);  /*  fft  */ 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kreal=kerxr [k]  [1]; 
kimag=kerxi [k] [1] ; 
real=xr [k] [1] ; 
imag=xi [k] [1] ; 

temp  = sqrt (kreal*kreal  + kimag  * kimag) ; 

if  (temp>0 . 00001)  { /*  complex  division  */ 

xr  [k]  [1]  = (real*kr.eal+imag*kimag) /temp; 

xi [k] [1] = (imag*kreal-real*kimag) /temp; } 

else  { 

xr [k] [1]=  0; 
xi [k] [1]=  0; } 

} 

IFLAG=1 ; nchan=2; 

fft(xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan);  /*  ifft  */ 
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/*  get  the  max  for  whole  field  */ 
maxd=0; 

for  (k  = 0;  k<N;  k++) 
for  (1  = 0;  1<N;  1++) 

{ 

temp  = (xr[k]  [l]*xr[k]  [l]+xi[k]  [l]*xi[k]  [1]); 
holdtk] [l]=temp; 

if  (temp>maxd) 
maxd=temp; 

} 

field [i] [j]=maxd; 
if (i==64  &&  j==64) 
write_image ("mushroom2",  N,  hold); 

} /*  end  of  the  main  lo.op  */ 

/*  Pick  up  the  mushroom  with  the  max  value  */ 

maxd=0; 

for  (k  = 0;  k<2;  k++) 
for  (1  = 0;  1<2 ; 1++) 

{ 

if  (field [k] [1] >maxd)  { 
maxd=field[k] [1] ; 
max_i=k; 
max_j=l;  } 

} 

if  (max_i==0)  max_i=2*Y;  else  max_i=2*Y+l;  /*  Get  the  max  */ 

if  (max_j==0)  max_j=2*X;  else  max_j=2*X+l; 

print f("\n  max_j=  %d  max_i=  %d",  max_ j , max_i ) ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

polar [k] [1] =0; 
plogtk] ( 1 ] =0; 

} 


polarTransfer (max_i,max_j) ; 
logTransfer () ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

xr [k] [l]=plog[k] [1] ; 

} 

IFLAG=0;  nchan=l; 

fft (xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan) ; 

for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

kreal=kerxr [k] [1] ; 
kimag=kerxi [k] [1] ; 
real=xr [k] [1] ; 
imag=xi [k]  [1] ; 

temp  =sqrt (kreal  * kreal  + kimag  * kimag) ; 

if  (temp>0 . 00001)  { /*  complex  division  */ 

xr [k] [1] = (real*kreal+imag*kimag) /temp; 

xi [k] [1] = (imag*kreal-real*kimag) /temp;  } 

else  { 

xr [k] [1]=  0; 
xi [ k ] [1]=  0; } 

} 

IFLAG=1;  nchan=2; 

fft(xr,  xi,  IFLAG,  MFLAG,  xsize,  nchan); 
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for  (k=0;  k<Nr;  k++) 

for  (1=0;  KNtheta;  1++)  { 

*pout++=xr [k] [1] ; 

*pout++=xi [k] [1] ; 

} 

fd=open (out_file,  0_CREAT  | 0_WR0NLY,  0644); 
write (fd,  (float*) mushroom,  N*N*sizeof (float) *2) ; 
close (fd) ; 


void  logTransfer (void) 

{ 

int  k, 1; 
int  nt,  nr; 
float  r,  rx,  tr; 
float  f; 

dr=log( (double) (N-l) ) / (double) N; 
for  (nr  = 0;  nr<Nr;  nr++)  { 

rx= (float)  exp ( (double) nr*dr) ; /*  coordinates  in  polar  plane  */ 

k=  (int)rx; 

tr  = rx  - (float)  k; 

for  (nt  = 0;  ntCNtheta;  nt++)  { 

f=polar [k] [nt] +tr* (polar [k+1] [nt] -polar [k] [nt] ) ; 
plog [nr] [nt]=f; 

} 

} 

} 

void  polarTransfer (int  m,  int  n) 

{ 

int  k,  1;  /*  m, 1 mean  y,  and  n,  k mean  x */ 

int  nt,  nr; 

float  theta,  r,  xx,  yy,  tx,  ty; 

dtheta=2 . 0*M_PI/ (f loat)Ntheta; 

for  (nt=0;  nt<Ntheta;  nt++)  { /*  loop  of  theta  */ 

theta  = -M_P l+( float )nt*dtheta;  /*  range  from  -pi  to  pi  */ 
for  (nr  = 0;  nr<Nr;  nr++)  { 

r= (float ) nr /2; 

xx=r*cosf (theta) ; /*  calculate  new  coordinates  */ 

yy=r*sinf (theta) ; 

k=  (int) xx  + n ; /*  index  */ 

1=  (int)yy  + m ; 

tx  = xx  - (float) (int) (xx) ; /*  difference  for  interpolation 

*/ 

ty  = yy  - (float)  (int)  (yy) ; 

if  (xx  <0)  { tx  = 1.0+tx;}  /*  for  correct  index  */ 
if  (yy  <0)  { ty  = 1.0+ty;} 
if  (xx>0)  [k=k+l; } 
if  (yy>0)  {1=1+1;} 
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if  (k>=l  &&  1>=1  &&  k<N  &&  1<N  ) { 
polar [nr] [nt]=bilinearltrpl (k, l,tx,ty) ; } 


float  bilinearltrpl (int  i,  int  j,  float  rx,  float  ry) 
float  f; 

f= (carte [j] [i— 1] -carte [ j— 1] [i-l])*ry 
+ (carte [ j— 1 ] [i] -carte [j-1]  [i-l])*rx 
+ (carte [j] [i]+carte[ j-1] [i— 1] 

-carte [j-1] [i]-carte[j] [i-1] ) *rx*ry+carte [ j-1] [i-1]; 
return  f; 

} 


void  usage (void) 

{ 

fprintf ( stderr,  "\tusage:  edges  <size>  <scale>  <image  file>  [i]\n"  ); 
exit  ( 1 ) ; 


void  get_parameters ( int  argc,  char*  argv[]  ) 

if ( argc  < 9 ) usage (); 

N=atoi ( argv[l]  ); 
if ( N<4  | | N>2048  ) 

{ fprintf ( stderr,  "invalid  image  size:  %s  \n",  argv[l]  );  usage (); 

Nr=atoi ( argv[2]  ); 

Ntheta=atoi ( argv[3]  ); 

X=atoi(  argv[4]  ); 

Y=atoi(  argv[5]  ); 

im_file  = argv[6]; 
out_file  = argv[7]; 
ker_file  = argv[8]; 


/*********************************************************************, 

/* 

/ * read_image : * / 

/*  reads  the  raw  image  (i.e.  8bit  gray  scale,  size  NxN)  "im  file"  */ 

/*  and  stores  it  in  floating  point  format  in  buffer  s.  Memory  for  */ 

/*  must  be  allocated  beforehand.  */ 

/*  ' 

*/ 

/j********************************************************************/ 


void  read_image(  char*  im_file,  int  N,  float**s  ) 

int  i,  j,  buf_len; 
float*buf ; 

FILE  *f ; 


if ( ! (f=fopen ( im_file,  "r"  ))  ) 

{ fprintf ( stderr,  "can’t  open  file  %s\n",  im_file  ) ; exit  ( 1 );  } 


buf_len=N*sizeof ( float); 
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if ( ! (buf  = (float*)  malloc ( buf_len  ))  ) 

{ fprintf ( stderr,  "can't  allocate  buffers\n"  );  exit  ( 1 );  } 

for(  i=0;  i<N;  i++  ) 

{ 

fread ( buf,  sizeof ( float),  N,  f ) ; 
for ( j=0;  j<N;  j++  ) { 
s [ i ] [j]=  buf [j]; 

} 

} 

fclose  ( f ) ; 
free ( buf  ) ; 


void  write_image(  char*  im_file,  int  N,  float**s  ) 

int  i,  j,  buf_len; 
float *buf; 

FILE  *f ; 

if(  !(f=fopen(  imfile,  "w"  ))  ) 

{ fprintf ( stderr,  "can't  open  file  %s\n",  im_file  );  exit ( 1 );  } 

buf_len=N*sizeof ( float)  ; 

if ( ! (buf  = (float*)  malloc ( buf_len  ))  ) 

{ fprintf ( stderr,  "can't  allocate  buffers\n"  );  exit ( 1 );  } 

for(  i=0;  i<N;  i++  ) 

{ 

for ( j=0;  j<N;  j++  ) 
buf [ j]=s [i] [ j] ; 

fwrite ( buf,  sizeof ( float),  N,  f ) ; 


fclose  ( f ) ; 
free ( buf  ) ; 


void  **array2 ( int  X,  int  Y,  int  elem_size) 

void  *data; 
void  **  ptr; 
int  row_bytes,  i; 

if ( ! (data  = (void  *)malloc(  X * Y * elem  size  ))  ) 
return ( NULL  ) ; 

if ( ! (ptr  = (void  **)  malloc (sizeof ( void  * ) * X ))  ) 
return ( NULL  ) ; 

row_bytes  = Y * elem_size; 

for ( i=0;  i<X;  i++  ) 

Ptr [i]  = (void  *)((int)data  + i * row  bytes); 
return ( ptr  ) ; 
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